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Variables   Representing  Crash   Frequencies : 

ace       total  number  of  accidents 

inj        Number  of  crashes  that  involve  injuries 

pdo       Number  of  Crashes  with  Property  Damage  Only 

fat       Number  of  Crashes  that  result  in  fatalities 

Variables   Representing  Explanatory   Terms: 

slen  Length  of  highway  section  in  1/lOOOth  of  a  mile 

adt  Average  Annual  Daily  Traffic 

its  Number  of  intersections  in  the  section 

Iw  Lane  width 

Iwc  Lane  width  redefined  as  categorical  variable 

ops  Paved  shoulder  width 

opsc  Paved  shoulder  redefined  as  categorical  variable 

oups  Paved  shoulder  width 

oupsc  Paved  shoulder  redefined  as  categorical  variable 

oc  Variable  to  represent  presence  or  absence  of  Curb 

fr  Coefficient  of  Friction 

spd  Speed  Limit 
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statistical    Terms: 

nta  negative  binomial  distribution 

9  Overdispersion  factor  in  nb  Distribution 

GLM  Generalized  Linear  Models 

c  Expected  value  of  y-intercept  of  regression  model 

P  Expected  value  of  regressor  coefficient 

Df  Degrees  of  Freedom 

AIC  Akaike' s  Information  Criteria 

D  Deviance 

Resid  Residual 

SS  Sum  of  Squares 

n  Number  of  observations 

MAD  Mean  Absolute  Deviation 

Std  Err  Standard  Error 

Modeled  as  function  of 

:  Represents  the  product  term  of  two  variables 

Obs  Observed  /Actual  value 

Pred  Predicted  value 
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The  accuracy  and  reliability  of  crash  prediction 

models  depend  on  the  validity  of  assumptions  made  in  the 

analysis,  definition  of  response  variables  and  proper 

representation  of  influencing  factors  on  the  relationships 

established  between  crash  frequencies,  geometric 

characteristics  and  operational  characteristics  of  highways. 

Several  crash  prediction  models  were  developed  in  the 

past  years  by  investigators.   These  models  have  helped  the 

highway  engineers  design  the  highway  sections  with  improved 

safety  and  economy.   They  were  also  used  in  estimating  the 

overall  benefits  of  highway  improvement  programs.  Several 

studies  reported  in  the  literature  show  that  accident 


prediction  models  were  developed  to  understand  the  effect  of 
cross  sectional  parameters  on  crash  occurrence. 

In  this  study,  crashes  that  occurred  during  the  years 
1988  through  1991  in  the  State  of  Florida  are  analyzed  using 
Statistical  Methods.  The  analysis  done  using  this  large 
number  of  observations  has  resulted  in  some  important 
results.   The  accepted  belief  concerning  the  distribution  of 
crash  data  was  found  to  have  several  inconsistencies 
including  violation  of  basic  assumptions.   A  fair  portion  of 
the  analysis  time  was  used  to  find  the  best  distribution 
function  that  can  be  used  to  represent  crash  frequency. 

Crash  rate,  a  function  of  crash  frequency,  section 
length  and  traffic  volume  is  generally  considered  as  the 
response  variable  in  crash  modeling  studies.   Crash  rate  was 
found  to  have  very  weak  relationship  with  explanatory 
variables  as  indicated  by  high  p-values.   Crash  frequency  was 
found  to  be  the  ideal  response  variable  with  section  length 
and  traffic  volumes  defined  as  explanatory  variables. 

The  use  of  certain  transformation  functions,  as 
recommended  by  most  of  the  studies  done  in  the  past  was  found 
to  deteriorate  the  model  quality.   Several  experiments  were 
done  to  find  the  best  transformation  function  to  represent 
the  model  parameters.   The  relationship  between  cross 
sectional  variables  and  crash  frequency  was  found  to  be  non- 
linear . 
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The  results  of  this  analysis  can  be  used  to  estimate 
the  contribution  of  each  design  parameter  in  the  expected 
crash  frequency  during  any  given  time  period  for  a  two-lane 
urban  highway  section.   The  effect  of  each  variable  on  total, 
PDO  (Property  Damage  Only),  injury  and  fatal  crashes  can  be 
estimated  using  the  models.   The  procedure  used  in  the 
analysis  can  be  used  to  study  other  types  of  highways. 
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CHAPTER  1 
INTRODUCTION 

Background 

Crash  analysis  has  become  a  much  easier  task  in  the 
recent  past  since  the  state  authorities  have  started  taking 
solid  steps  in  accident  data  collection  and  management.  Why 
do  crashes  occur?  How  often  do  they  happen?  Can  they  be 
measured?  If  they  can  be  measured,  can  they  be  predicted?  If 
they  cannot  be  measured,  are  they  completely  random?   Many 
attempts  have  been  made  by  investigators  from  various 
professional  backgrounds  to  find  answers  to  such  questions. 

Several  studies  done  by  investigators  from  various 
professional  backgrounds  including  highway  engineering, 
transportation  planning,  statistics  and  mathematics  to  find 
relationship  between  highway  geometric  parameters  and  safety 
have  resulted  in  valuable  conclusions. 

The  Poisson  distribution  is  widely  used  to  represent 
the  crash  rate.   The  most  commonly  used  predictors  are  lane 
width,  shoulder  width  and  traffic  volume.  Some  of  the 
parameters  that  are  believed  to  affect  highway  crashes  are 
lane  width,  shoulder  width,  median  width,  surface 
characteristics  and  slopes,  drainage  condition,  horizontal 


curvature,  vertical  curvature,  sight  distance,  presence  of 
narrow  bridges,  type  of  access  control,  lighting  conditions 
and  width  of  clear  roadside  recovery  area. 

Though  a  number  of  studies  have  been  done  in  this  area, 
several  assumptions  and  concepts  concerning  the  relationships 
established  were  not  validated.   A  close  observation  of  these 
models  showed  that  they  lack  any  kind  of  uniformity. 
Inconsistencies  were  found  in  the  distribution  assumed,  the 
parameters  considered  for  analysis,  the  functions  used  for 
variable  transformation,  the  variables  used  in  the  final 
prediction  model,  and  the  coefficient  of  parameters. 

Scope 

The  scope  of  this  study  includes  review  of  studies  done 
in  the  past  that  are  relevant  to  highway  safety  and  highway 
crash  modeling.   Information  obtained  from  the  review  of 
literature  is  used  as  guidelines  in  the  initial  stages  of 
analysis . 

The  second  stage  of  the  study  is  the  preparation  of 
data.   The  crash  data  used  in  the  analysis  consists  of  all 
reported  accidents  that  occurred  during  the  period  of  1989- 
1992  on  two-lane  urban  highways  within  the  State  of  Florida. 
Most  of  the  parameters  that  are  directly  or  indirectly 
related  to  highway  design  are  included  in  the  database. 


Basic  assumptions  concerning  crash  distributions  should 
be  validated  based  on  actual  data.   One  entire  chapter  is 
dedicated  to  identifying  the  appropriate  distribution 
function  for  representing  crash  frequency.  Other  issues 
related  to  defining  crash  rate  as  the  dependent  variable  are 
also  reviewed. 

The  regression  analysis  and  the  search  for  a  model  are 
usually  exhaustive.   To  avoid  large  number  of  random 
searches,  an  analysis  strategy  has  been  developed.   As  part 
of  the  strategy,  certain  norms  are  also  developed  for  testing 
the  models.   All  assumptions  made  based  on  literature  review 
and  known  concepts  are  validated  or  rejected  based  on 
statistical  inference.  The  models  reviewed  are  further 
diagnosed  to  find  desirable  design  features. 

Objectives 

The  primary  objective  of  this  study  is  to  find  the 
effect  of  highway  geometric  and  highway  operational 
parameters  on  crash  frequency.   The  effect  of  these 
parameters  on  total,  PDO  (Property  Damage  Only),  injury  and 
fatal  crashes  are  estimated. 

There  are  some  intermediate  objectives  that  will 
address  some  key  issues  and  form  the  foundation  of  the 
analysis.   Finding  the  best  distribution  function  to 


represent  crash  distribution,  identifying  the  best  form  of 
independent  variables  and  identifying  the  interactions  among 
independent  variables  are  intermediate  objectives. 

Another  intermediate  objective  of  the  study  is  to  find 
out  whether  uniformity  of  highway  design,  represented  by 
section  length,  contributes  to  crash  rate.  If  it  does,  then 
crash  frequency  should  be  considered  as  the  dependent 
variable  and  section  length  as  one  of  the  independent 
variables . 

Layout  of  this  Report 

The  results  of  studies  done  in  the  past  are  briefly 
discussed  in  the  2""^  Chapter  titled  "Literature  Review." 
Most  of  the  references  are  made  to  studies  done  in  the 
United  States  or  Canada.   A  few  studies  done  in  other 
developed  countries  are  also  included  in  the  review  of 
literature . 

The  assumptions  concerning  distribution  of  crash 
frequency  are  evaluated  and  the  results  are  discussed  in 
Chapter  4.   Three  experiments  are  done  on  crash  frequency  to 
test  the  validity  of  accepted  assumptions  and  to  identify 
the  ideal  distribution  function. 

All  available  explanatory  variables,  their  ranges  and 
limits  are  discussed  in  Chapter  5.   Data  statistics  like 


mean,  median,  mode  and  quartiles  are  calculated  and 
graphical  methods  are  used  to  visualize  the  raw  data  from 
various  perspectives. 

Chapter  6  gives  an  introduction  to  the  Generalized 
Linear  Models  and  briefly  discusses  the  parameters  that  are 
used  for  variable  transformation  and  model  selection. 
Statistical  methods  used  for  organized  regression  analysis 
and  norms  developed  for  testing  the  developed  models  are 
discussed  in  this  chapter.   The  information  gathered  from 
literature  is  used  to  develop  the  base  model. 

Chapter  7  consists  of  the  summaries  of  all  stages  of 
regression  analysis  done  starting  from  the  base  model  to  the 
final  model.  Each  independent  variable  is  analyzed 
separately  and  an  intermediate  model  is  selected  at  the 
conclusion  of  each  stage  of  analysis.   The  order  in  which 
the  variables  are  analyzed  is  prioritized  based  on  the 
relative  importance  of  each  parameter  in  the  model  as 
suggested  by  the  base  model.   This  chapter  is  concluded  with 
the  selection  of  the  final  model. 

In  chapter  8,  the  final  model  selected  in  chapter  7  was 
developed  from  75%  of  the  data.  The  other  25%  of  the  data 
were  used  for  model  testing  at  each  stage.   Once  the  final 
model  is  identified,  the  analysis  data  and  test  data  is 
combined  and  the  final  model  is  updated  using  all  data. 


Three  other  models  are  developed  using  the  same  procedure  to 
represent  injury  crashes,  PDO  crashes  and  fatal  crashes 
respectively.   These  models  along  with  all  the  model 
parameters  are  displayed  in  this  chapter. 

Crash  frequencies  are  computed  at  different  levels  of 
each  independent  variable.  The  calculated  crash  frequencies 
are  plotted  and  displayed  in  this  chapter.   These  figures 
can  be  used  to  understand  the  prevailing  trends  in  the 
relationships  established.   While  preparing  the  plot  for  one 
variable,  all  other  variables  are  held  constant  at  their 
median  values. 

In  chapter  9,  categorical  representation  of  cross 
sectional  variables  revealed  some  type  of  trend  which  are 
displayed  in  this  chapter  and  recommended  for  future 
studies.   Lane  width,  paved  shoulder  width  and  unpaved 
shoulder  width  when  treated  as  categorical  variables  showed 
that  there  are  specific  values  of  these  parameters  at  which 
the  crash  frequency  is  minimum. 

The  conclusions  and  limitations  of  this  study  are 
briefly  discussed  in  chapter  10  and  the  listing  of  all 
literature  reviewed  is  given  in  References. 


CHAPTER  2 
REVIEW  OF  LITERATURE 

The  results  of  studies  done  in  the  past  on  safety 
analysis  are  categorized  by  parameters  and  discussed  briefly 
in  the  following  sections. 

Lane  Width 

The  Florida  Green  Book^  specifies  in  page  III-25  that 
"Traffic  lanes  should  be  12  feet  in  width,  but  shall  not  be 
less  than  10  feet  in  width.   Streets  and  highways  with 
significant  truck  traffic  should  have  12  feet  wide  traffic 
lanes . " 

Lane  width  was  found  to  affect  crash  rates  on  rural 
two-lane  highways,  particularly  run-off-road,  opposite 
direction  and  sideswipe  crash  rates.   Jorgensen^  reviewed 
fifteen  studies,  conducted  before  1978,  and  dealt  with  the 
effect  of  lane  width  on  safety.   Eight  of  these  studies 
showed  that  crash  rates  decreased  as  the  lane  width  increased 
for  rural  two-lane  highways.   Another  study  showed  that  crash 
rates  for  highways  with  12  feet  wide  lanes   did  not  differ 
significantly  from  those  for  highways  with  11  feet  wide 
lanes.   Two  other  studies^  found  no  relationship  between 
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crash  rates  and  lane  width  for  rural  two-lane  highways. 
Three  studies  on  two-lane  urban  arterials  could  not  find 
relationships  between  roadway  width  and  crashes. 

The  following  is  a  summary  of  the  most  important 
findings  of  the  studies  on  two-lane  rural  highways  that  were 
reviewed  by  Jorgensen^  . 

•  Gupta  and  Jain^  used  multiple  linear  regression  analyses 
to  investigate  the  effects  of  roadway  width  on  crash 
rates.   Increasing  roadway  width  was  found  to  reduce 
multiple-vehicle  crashes.   The  roadway  width  was  found  to 
have  no  effects  on  crash  rates  at  AADT  higher  than  3000 
vehicles  per  day. 

•  Dart  and  Mann^  found  that  crash  rates  decreased  as  lane 
width  increased  up  to  11  feet,  then  remained  relatively 
constant . 

•  Cope^  showed  from  a  before  and  after  study  a  significant 
decrease  in  crash  rates  when  widening  lanes  from  9  feet  to 
12  feet,  especially  at  high  crash  sections. 

•  Shah  found  a  definitive  relationship  between  pavement 
width  and  crash  rate.   The  results  showed  that  22  feet  to 
24  feet  wide  pavements  had  fewer  crashes  than  narrower  and 
wider  pavements. 

•  Shannon  and  Stanley^  studied  the  relationship  of 
construction  cost,  maintenance  cost  and  crash  costs  as 


related  to  paved  width.   The  analysis  revealed  a  general 
tendency  for  crash  rates  to  decline  as  pavement  width 
increased.  For  two-lane  urban  arterial  streets,  Gupta  and 
Jain^  ,  Head*^  and  Mulinazzi^  could  not  find  a  relationship 
between  crash  rates  and  lane  width. 

Silyanov^"  evaluated  international  studies  on  two-lane 
two-way  highways  and  found  that  crash  rates  decreased  as 
pavement  width  increased  for  pavement  widths  between  13  feet 
and  30  feet.   On  wide  pavements,  the  crash  reduction  due  to 
improvement  was  lower  than  that  on  narrow  pavements.   Based 
on  several  international  studies,  Choueeiri  et  al.^^, 
concluded  that  a  significant  decrease  in  crash  rates  could  be 
expected  by  increasing  pavement  width  up  to  about  25  feet. 
A  study  in  Australia  by  Mclean^^  showed  that  the  most 
safety  effective  lane  width  is  about  3.4  meter  (11  feet). 
McCarthy^^  ,  showed  from  a  before  and  after  study  that 
widening  lanes  on  17  sites  from  2.7  meter  and  3.0  meter  (9 
feet  and  10  feet)  to  3 . 4  meter  and  3.7  meter  (22  feet  and  24 
feet)  resulted  in  a  reduction  in  crash  rate  by  22%.   However, 
Choueeiri  et  al.^^  reported  results  from  a  previous  study 
that  show,  contrary  to  the  expectation  that  the  crash 
severity  increases  as  pavement  width  increases.   They 
suggested  that  the  reason  for  this  might  be  the  higher 
operating  speed,  on  the  sections  that  have  wider  pavements. 
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Zegeer"  found  that  the  only  crashes  that  can  be 
expected  to  decrease  with  lane  widening  were  run-of f-the-road 
(ROR)  crashes  and  opposite-direction  (OD)  crashes.   He  also 
found  that  only  property-damage  crashes  and  injury  crashes 
decreased  as  lane  width  increased  with  no  change  in  fatality 
rate.  Very  little  additional  benefit  was  realized  by  widening 
a  lane  beyond  11  feet. 

In  that  study,  an  economic  analysis  was  conducted  to 
determine  the  expected  cost  effectiveness  of  lane  widening. 
Savings  due  to  crash  reductions  were  the  only  benefits 
included  in  the  analysis.  It  was  concluded  that  11  feet  wide 
lane  may  be  optimal  for  rural  two-lane  roadways. 

Zeeger  and  Deacon"  reviewed  30  studies  performed  until 
the  mid  1980 's  and  concluded  that  no  satisfactory 
guantitative  model  relating  crash  rates  to  lane  width  and 
shoulder  width  could  be  found.   Therefore,  they  calibrated  a 
new  model  that  estimates  the  most  likely  relationships  of 
crashes  with  lane  width,  shoulder  width  and  shoulder  type  on 
two-lane  rural  highways.   This  model  was  derived  using  data 
obtained  from  four  previous  studies. 

AR  =  4.15(.8907)^  (.9562)^  (1.0026)^^  (.9403)^  (1.004)^^ 
Where, 

L  =   lane  width  in  feet 
S  =   shoulder  width  in  feet  (including  stabilized  and 

unstabilized  components) 
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P  =   width  in  feet  of  stabilized  component  of  shoulder 

(0<=P<=S),  P=0  for  un-stabilized  shoulders  and  P=S  for 

full-width  stabilization;  and 
AR  =  Number  of  ROR  and  OD  crashes  per  million  vehicle  miles. 

The  authors  recognized  that,  many  assumptions  were  made 
in  the  development  of  the  above  model.  They  considered  the 
model  as  a  first  approximation  of  the  effect  of  lane  and 
shoulder  conditions  on  crash  rates.  No  attempt  was  made  to 
determine  the  pavement  widths  that  should  be  used  under 
various  traffic  conditions  or  roadway  classes. 

Later,  Zegeer  et  al.  ^^  developed  another  model  to 
quantify  the  benefits  of  shoulder  and  lane  improvements  based 
on  data  selected  from  seven  states.   Only  two-lane  roadway 
sites  were  selected.   The  crash  types  that  appeared  to  be 
highly  correlated  with  lane  width,  shoulder  width  and 
shoulder  type  were  single  vehicle  (fixed  object,  rollover  and 
run-off-the-road  crashes),  head-on,  and  sideswipe  (opposite- 
direction  and  same-direction)  crashes.   Using  regression 
analysis,  the  following  model  was  derived. 
AO  =  .0019(AADT) -^^24  (,8786)"  (.  9192)  ^^  (.  9316)  "^  (1.2365)" 

(.8822)^^^^  (1.3221)^^2 
where, 

TERl  =  1  if  flat,  0  otherwise; 
TER2  =  1  if  mountainous,  0  otherwise; 
PA  =  Average  paved  shoulder  width  in  feet; 
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UP  =  unpaved  shoulder  width  in  feet; 

H  =  median  roadside  (or  hazard)  rating; 

W  =  lane  width  in  feet; 

AO  -  the  number  of  related  crashes  per  mile  (single  vehicle. 

Head-on  and  sideswipe) ; 
AADT  =  Average  annual  daily  traffic. 

The  above  study  indicates  that  as  the  amount  of  lane 
widening  increases  the  percentage  reduction  in  related 
crashes  also  increases.   The  first  foot  of  lane  widening 
between  8  and  12  feet,  corresponds  to  a  12%  reduction  in 
related  crashes,  two  feet  corresponds  to  a  23%  reduction,  3 
feet  to  a  32%  reduction  and  12  feet  of  widening  to  40  percent 
reduction. 

The  above  model  only  applies  to  two-lane  rural  highways 
with  lane  widths  of  8  to  12  feet,  shoulder  width  of  zero  to 
12  feet  (paved  or  unpaved)  and  traffic  volumes  of  100  to 
10,000  (AADT).   This  model  was  used  to  develop  an 
informational  guide^^  that  enables  estimation  of  safety 
benefits  of  various  roadway  and  roadside  improvements. 

Goldstine^^  conducted  a  before  and  after  analysis  on 
twenty  five  projects  covering  152  miles  of  road  to  examine 
the  effect  of  road  and  shoulder  widening  on  crash  rates  in 
New  Mexico.   Reductions  of  38%  to  53%  in  crash  rate  were 
observed.   The  study  supported  the  TRB  Special  Report  214  in 
its  recommendation  that  the  higher  the  AADT  the  wider  the 
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road  should  be.   However,  the  study  recommended  using  even 
greater  minimum  widths. 

More  recently,  Garber  and  Joshua^^  developed  a  logistic 
regression  model  to  describe  the  probability  of  truck 
involvement  in  crashes  as  a  linear  logistic  function  of 
traffic  and  highway  variables.    For  undivided  two  and  four 
lane  highways,  the  most  significant  variables  were  the  slope 
change  rate,  lane  width  and  to  a  lesser  extent  shoulder 
width.   The  model  derived  for  these  types  of  highways  is 
given  below. 
P  =  1  /{I  +  e"P^) 

(3  =  13.648  -  1.164*LW  -.9095*SW  -.1969*SCR  +  .0501*SW*SCR 

where, 

SW  =   shoulder  width  in  feet, 

SCR  =   slope  change  rate,  the  rate  at  which  the  longitudinal 

slope  changes 
LW  =   lane  width  in  feet,  and 
P   =  probability  of  large  truck  crash  involvement. 

Shoulder  Width 
The  FOOT  Green  Book^  specifies  that  "The  width  of  all 
shoulders  should,  ideally,  be  at  least  10  feet  in  width. 
Where  economical  or  practical  constraints  are  severe,  it  is 
permissible,  but  not  desirable,  to  reduce  the  shoulder  width. 
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Outside  shoulders  shall  be  provided  on  all  streets  and 
highways  with  open  drainage  and  should  be  at  least  6  feet 
wide.   Facilities  with  a  heavy  total  traffic  volume  or  a 
significant  volume  of  truck  traffic  should  have  outside 
shoulders  at  least  8  feet  wide. 

Previous  studies  that  investigated  the  effect  of 
shoulder  width  on  safety  dealt  with  two-way  rural  highways. 
Zegeer"  reviewed  some  of  these  studies  and  found  that,  there 
was  lack  of  correlation  between  shoulder  width  and  crash  rate 
on  two-lane  rural  highways  with  AADT  less  than  2000  vehicles 
per  day.  Wide  shoulders  appeared  to  be  most  beneficial  where 
AADT  are  between  3000  and  5000.   In  general,  shoulders  4-7 
feet  wide  were  preferred  to  wider  ones  although  some  studies 
suggested  that  shoulders  as  wide  as  10-12  feet  are  the 
safest . 

Crillio  and  Council^"  concluded  from  reviewing  several 
studies  that  increasing  shoulder  width  up  to  1.8  meters  (6 
feet)  wide  on  facilities  with  AADT  greater  than  1000  improved 
safety.   However,  the  benefits  of  increasing  shoulder  width 
above  1.8  meters  (6  feet)  were  not  clear. 

A  study  in  Oregon^^  concluded  that  total  crashes 
increased  with  increasing  shoulder  width  except  for  roads 
that  have  AADT  between  3600  and  5500.   Shoulders  wider  than  8 
feet  experienced  significantly  higher  crash  rates  than 
shoulders  less  than  8  feet  wide. 
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Hiembach  et  al.  ^^  concluded  that  highway  sections  that 
have  paved  shoulders  are  associated  with  lower  crash  rate 
than  with  identical  sections  that  do  not  have  shoulders. 

Rogness  et  al.  "  compared  crash  frequency  for  the  time 
before  and  after  shoulder  widening.   They  found  that  the 
addition  of  full-width  paved  shoulders  to  a  two-lane  roadway 
was  effective  in  reducing  the  total  number  of  crashes.  For 
AADT  less  than  3000,  they  recommended  a  paved  shoulder  in 
place  of  an  additional  travel  lane.   Adding  paved  shoulders 
reduced  crash  rate  by  55%  for  AADT  between  1000  and  3000, 
21.4%  for  AADT  between  3000  and  5000;  and  0%  for  AADT  between 
5000  and  7000. 

Zegeer^^  found  that  ROR  and  OD  crashes  decreased  as 
shoulder  width  increased  up  to  9  feet  for  two-lane  rural 
highways.   For  10-12  feet  wide  shoulders,  there  was  a  slight 
increase  in  these  crash  rates. 

Shoulder  Type 
The  possibility  of  a  vehicle  skidding  out  of  control  or 
turning  over  is  expected  to  increase  when  the  shoulder  is 
soft  or  is  covered  with  loose  gravel,  sand  or  mud.   The  FOOT 
Green  Book^  (page  V-3)  specifies  that  shoulders  "should  be 
capable  of  providing  a  safe  path  for  vehicles  traveling  at 
the  roadway  speed."   It  also  specifies  that  "the  shoulder 
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should  be  designed  and  constructed  to  provide  a  firm  and 
uniform  surface,  capable  of  supporting  vehicles  in  distress." 

Turner  et  al .  ^^  compared  the  crash  experience  on  three 
types  of  undivided  highways:  two-lane  with  unpaved  shoulder, 
two-lane  with  paved  shoulder  and  four-lane  with  unpaved 
shoulder.   Two-lane  roadway  with  paved  shoulder  was  found  to 
be  the  safest  and  two-lane  with  no  shoulder  was  found  to  be 
the  least  safe. 

In  general,  shoulder  paving  or  stabilization  is 
desirable  if  conducted  properly.   Zegeer^^  reported  that  the 
effectiveness  of  shoulder  stabilization  depends  on  the  need 
for  improvement  from  a  safety  standpoint.   Based  on  crash 
data  from  Ohio,  they  found  that  shoulder  stabilization  can 
reduce  crashes  by  38%  and  injury  and  fatality  crashes  by  46%. 
In  another  study  using  crash  data  from  North  Carolina,  they 
found  that  for  two-lane  rural  highways,  unpaved  shoulders 
resulted  in  higher  crash  rate  and  severity  than  paved 
shoulders . 

Foody  and  Long"  performed  a  series  of  analysis  of 
variance  (ANOVA)  which  revealed  that  the  differences  in  crash 
rate  between  stabilized  shoulders  and  paved  shoulders  were 
not  significant.   However,  the  crash  rate  of  sections  having 
these  two  shoulder  types  was  significantly  less  than  that  of 
sections  that  have  unstabilized  shoulders. 
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Speed  Limit 

The  FOOT  Green  Book^  recommends  that  "the  design  speed 
should  not  be  less  than  the  expected  posted  or  legal  speed 
limit.   A  design  speed  5  to  10  mph  greater  than  the  posted 
speed  limit  will  compensate  for  a  slight  (and  generally  not 
enforceable)  overrunning  of  the  speed  limit  by  many  drivers." 

Jackobsberg  and  Danchik^^  investigated  the  effect  of 
speed  limit  on  the  safety  of  Maryland  roads.   They  could  not 
find  any  first  order  linear  relationships  between  crashes  and 
physical  characteristics  of  highways  including  speed  limits. 

Fieldwick  and  Brown^^  compared  the  crash  rates  and  speed 
limits  at  21  counties.   Speed  limits  in  those  counties  varied 
between  80  km/h  (50  mph)  and  120  km/hr  (75  mph) .  Using 
regression  analysis,  they  showed  that  safety  is  sensitive  to 
speed  limit.   For  example,  their  results  suggested  that 
reducing  rural  speed  limit  from  100  km/hr  (62  mph)  to  90 
km/hr  (56  mph)  could  reduce  fatalities  and  injuries  by  11% 
and  15%,  respectively.  The  authors  admitted  that  these 
figures  might  include  other  factors  (safety  measures  employed 
by  counties  that  use  lower  speed  limits)  not  investigated  in 
this  study.   In  addition,  the  study  did  not  differentiate 
between  highway  classes  (freeways,  two-way  two-lane,  etc.). 
Therefore,  their  results  should  be  viewed  with  caution. 

Fieldwick  and  De  Beer^°  analyzed  a  monthly  crash  time 
series  between  January  1972  and  December  1985.   The  results 
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showed  that  a  reduction  in  the  urban  speed  limit  from  60 
P(m/hr  (37  mph)  to  50  Km/hr  (31  mph)  would  reduce  fatal  and 
injury  crashes  by  12.3%  and  14.3%,  respectively. 

In  Texas,  speed-zoning  procedures  rely  primarily  on  the 
85th  percentile  speed  of  traffic  on  a  facility.   Oilman  and 
Dudek^^  investigated  the  argument  that  speed  zoning  below 
85th  percentile  may  be  beneficial  to  drivers  in  rapidly 
developing  areas.   Spot  speed,  speed  profile  and  crash  data 
were  collected  before  and  after  the  speed  at  six  urban  fringe 
highway  sites  in  Texas  were  reduced  from  55  mph  (the  85^'' 
percentile  speed)  to  45  mph.   No  changes  were  observed  in 
speeds,  speed  distributions,  speed  changing  activities  or 
crash  rates  at  the  sites.   They  concluded  that  the  lower 
speed  zones  were  not  effective  in  improving  safety  at  the 
investigated  sites. 

Garber  et  al .  ^^  investigated  the  effect  of  the  design 
speed  and  the  posted  speed  limit  on  safety.   The  types  of 
highways  included  in  the  study  were  urban  interstates,  rural 
interstates,  urban  arterials,  rural  arterials  and  major  rural 
collectors.   Thirty-six  different  locations  in  Virginia  were 
selected  for  the  study.   They  found  that  the  average  speeds 
on  these  highways  depend  on  design  speeds.   An  attempt  was 
made  to  correlate  crash  rates  with  average  speed  for  the 
different  types  of  highway.   No  strong  correlation  was  found 
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between  crash  rates  and  average  speed  for  any  given  type  of 
highway. 

They  also  found  that  drivers  tend  to  travel  at  higher 
speeds  on  highways  with  better  geometric  characteristics 
regardless  of  the  posted  speed  limit.   The  speed  variance  was 
found  to  be  a  function  of  the  difference  between  the  design 
speed  and  the  posted  speed  limit.   Results  of  regression 
analysis  showed  that  the  speed  variance  were  minimum  when 
this  difference  was  between  5  and  10  mph.   The  regression 
analysis  also  showed  that  crash  rates  increase  with 
increasing  speed  variance  for  all  classes  of  roads. 


CHAPTER  3 
DATA  ORGANIZATION 


Highway  Geometric  Data 

The  Florida  Department  of  Transportation  gathers  and 
maintains  information  pertaining  to  all  highways  and  streets 
in  the  State  of  Florida.   This  database  is  known  as  the  RCI 
data  (Roadway  Characteristic  Inventory) .   Each  Record  or  line 
of  information  in  the  RCI  database  represents  one  highway 
section. 

Some  of  the  relevant  items  in  these  records  are 
location  code  representing  the  begin  and  end  points  of  the 
highway,  lane  width,  paved  shoulder  width  and  unpaved 
shoulder  width,  shoulder  type,  traffic  volume,  speed  limit, 
number  of  intersections,  presence  of  raised  curb  and  friction 
factor.   The  information  available  about  each  highway  section 
in  the  RCI  data  is  broadly  classified  based  on  location, 
highway  type  and  General  characteristics  of  the  highway. 

Location : 

The  location  code  is  the  first  nine  digits  of  each 
record,  designed  to  geographically  identify  the  highway 
section.   The  location  code  includes  county  number,  highway 
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number  and  mile  point.  Two  location  codes  are  assigned  to 
each  highway  section,  one  representing  the  beginning  and  the 
other  representing  the  end  of  the  section.   The  length  of  the 
section  may  be  calculated  as  the  differences  between  the 
begin-mile  point  and  the  end-mile  point. 

Hi  gh way   type : 

Several  numeric  codes  are  used  to  represent  the  highway 
type.   Access-control,  number  of  lanes,  presence  of  median 
and  number  of  directions  in  which  traffic  moves  are  some  of 
them.   The  highway  type  is  recognized  based  on  this 
information.   For  example,  if  the  number  of  lanes  is  2, 
presence  of  median  is  0,  number  of  directions  in  which 
traffic  flow  is  2,  access  control  is  0,  and  type  of  location 
is  1,  that  record  represents  a  two-lane,  urban,  undivided 
highway  section. 

General    characteristics : 

The  characteristics  of  a  highway  section  that  play  an 
important  role  in  this  study  include  cross  sectional  design 
features  like  lane  width,  shoulder  width,  shoulder  type,  and 
operational  parameters  like  speed  limit,  presence  of  on- 
street  parking  and  traffic  volume. 
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Highway  Accident  Data 
The  Florida  Department  of  Transportation  also  maintains 
another  database  that  consists  of  all  measurable  and 
representable  information  pertaining  to  each  highway  accident 
that  has  been  reported.   The  information  available  in  the 
crash  data  base  may  be  broadly  classified  based  on  location 
and  crash  characteristics 

Location : 

The  location  code  in  the  crash  database  is  identical  to 
that  of  the  location  code  in  the  RCI  data.  The  only 
difference  is  that  in  the  crash  data,  only  one  location  code 
is  required  to  represent  the  spot  where  the  crash  has 
occurred.   The  location  code  common  to  both  databases  helps 
to  link  the  crash  incident  to  the  highway  section  in  which  it 
has  occurred. 

Crash   characteristics : 

Crash  characteristics  include  details  about  types  of 
crash  severity,  times  of  occurrence  and  weather  conditions. 
A  subroutine  was  developed  to  merge  the  crash  data  into  the 
RCI  data.   While  merging,  the  computer  program  reads  the 
first  record  in  the  crash  data  and  remembers  the  location 
code.   A  search  is  performed  on  the  RCI  data  to  find  the 
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record  for  which  the  location  codes  are  in  the  range  of  the 
crash  location.   When  this  condition  is  met,  all  built-in 
crash  parameters  are  updated  in  the  RCI  database  based  on  the 
information  obtained  from  the  crash  data.  At  the  end  of 
merging,  the  resulting  database  will  contain  exactly  the  same 
number  of  records  as  that  of  the  RCI  data,  regardless  of  the 
number  of  records  in  the  accident  data. 

Highway  Classification 
The  highway  sections  in  the  rural  areas  are 
operationally  different  from  similar  sections  in  the  urban 
areas.   High  level  of  pedestrian  activities,  large  number  of 
access  points,  high  traffic  volumes,  restricted  shoulders  and 
the  absence  of  safe  recovery  area  are  characteristics  of 
urban  highways.   Therefore,  urban  highways  and  rural  highways 
are  analyzed  separately. 

Two  highway  sections  with  similar  geometric  and  traffic 
parameters  in  the  same  location  could  still  be  different  from 
each  other  based  on  the  type  of  access.   Highways  with  full 
access  control  fall  under  the  category  of  freeways.   The 
other  two  types  are  partially  access-controlled  highways  and 
highways  with  no  access  control.  Further,  highways  are  also 
classified  based  on  the  presence  of  median  and  based  on  the 
number  of  lanes. 
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TABLE  3.1  CI 

assification  of  Highways 

# 

Code 

Location 

Access  Control 

Median 

#  Lanes 

1 

uu2 

urban 

no  control 

undivided 

2 

2 

uu4 

urban 

no  control 

undivided 

4 

3 

ud4 

urban 

Partial 

divided 

4 

4 

uf4 

urban 

Full 

divided 

4 

5 

uf6 

urban 

Full 

divided 

6 

6 

ru2 

rural 

no  control 

undivided 

2 

7 

ru4 

rural 

no  control 

undivided 

4 

8 

rd4 

rural 

Partial 

divided 

4 

9 

rf4 

rural 

full 

divided 

4 

10 

rf6 

rural 

full 

divided 

6 

Table  3.1  shows  the  code  used  to  represent  various 
types  of  highways.   The  effect  of  geometric  and  operational 
parameters  on  crash  frequency  depends  on  the  highway  type. 
For  example,  the  effect  of  lane  width  on  two  lane  highways 
could  be  much  different  from  that  of  six  lane  highways. 
Therefore,  each  highway  type  needs  to  be  analyzed  separately 
and  then,  if  required,  the  models  developed  could  be  examined 
for  similar  behavior  patterns. 


Two-lane  Urban  Undivided  Highways 
Highways  that  come  under  the  category  of  two-lane, 
urban,  undivided  highways  are  considered  for  this  study. 
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About  2500  highway  sections  in  the  State  of  Florida  belong  to 
this  category.   The  important  features  of  2-lane  urban 
highways  are  section  length,  AADT,  presence  of  on-street 
parking,  number  of  intersections,  number  of  railway 
crossings,  lane  width,  paved  shoulder  width,  unpaved  shoulder 
width,  presence  of  curb  and  coefficient  of  friction. 

The  crashes  that  occur  at  an  intersection  are  not 
included  in  the  study  since  such  crashes  are  dependent  more 
on  the  design  features  of  the  intersection  and  the  type  of 
control  used  than  on  the  characteristics  of  the  highway 
section.   Similarly  the  crashes  that  occur  at  a  horizontal 
curvature  are  dependent  more  on  the  features  of  the  curve 
than  on  the  longitudinal  features.  Therefore  highway  sections 
with  acute  horizontal  curvature  are  not  included  in  the 
study. 

Highway  sections  that  pass  through  railroad  crossings 
and  narrow  bridges  are  also  excluded  from  this  study.  The 
number  of  highway  sections  available  for  this  study  after 
removing  sections  with  sharp  curves,  railroad  crossings,  and 
narrow  bridges  are  about  2000.  Seventy-five  percent  of  these 
highway  sections  are  used  for  analysis  and  modeling.   Twenty- 
five  percent  of  the  remaining  highway  sections  are  used  for 
testing  the  models. 
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Data  Statistics 
The  minimum,  maximum,  mean,  median,  and  quartile  values 
of  each  parameter  considered  in  the  study  are  shown  in  Table 
3.2.   These  values  are  used  to  find  the  range  at  which  the 
majority  of  the  data  lies.   Section  length  is  measured  in  one 
thousandth  of  a  mile.   Traffic  volume  is  expressed  in  Annual 
Average  Daily  Traffic.   All  cross  sectional  parameters  are 
measured  in  feet  and  speed  limit  is  measured  in  mph. 


TABLE  3.2  Data  Statistics 

# 

Parameter 

Code 

Min. 

1st  Q 

Median 

Mean 

3rd  Q 

Max. 

1 

Section  Length 

slen 

10 

62 

144.5 

247.2 

330 

1933 

2 

Intersections 

its 

0 

0 

1 

1.76 

2 

24.5 

3 

Traffic  Volume 

adt 

913 

7442 

10890 

12090 

15580 

38680 

4 

Speed  Limit 

spd 

25 

35 

45 

41.78 

45 

55 

5 

Lane  Width 

Iw 

9 

12 

12 

11.90 

12 

15 

6 

Paved  shoulder 

ops 

0 

0 

0 

1.21 

2 

12 

7 

Unpaved  Shoulder 

oups 

0 

4 

6 

5.87 

8 

12 

8 

Total  Shoulder 

tosh 

0 

6 

8 

7.08 

8 

14 

9 

Outside  Curb 

oc 

0 

0 

0 

0.10 

0 

1 

10 

On-street  Parking 

pk 

0 

0 

0 

0.07 

0 

1 

11 

Friction 

fr 

0 

0 

0 

0.44 

1 

1 

12 

Total  Crashes 

ace 

0 

0 

1 

3.36 

4 

47 

13 

Property  Damage 

pdo 

0 

0 

0 

1.35 

2 

26 

14 

Injury 

inj 

0 

0 

1 

1.95 

2 

32 

15 

Fatality 

fat 

0 

0 

0 

0.06 

0 

3 
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Visual  Display  of  Data 

The  parameters  that  are  generally  used  to  express  the 
variable  statistics  are  shown  in  the  previous  table.   Though 
these  values  help  to  find  the  range  in  which  majority  of  the 
data  lie,  it  does  not  give  any  information  on  its 
distribution.   To  get  the  full  picture,  a  series  of  plots  are 
prepared  and  shown  in  the  following  pages. 

Two  figures  are  used  to  display  each  variable.   The 
first  figure  shows  the  plot  of  the  variable  in  the  order  in 
which  it  exists  in  the  database.   The  x-axis  represents  the 
observation  number  and  the  y-axis  represents  the  value  of  the 
variable  for  each  observation.   This  plot  looks  like  a 
scatter  plot  and  gives  an  idea  of  the  level  at  which  more 
observations  are  concentrated. 

The  plot  on  the  right-hand  side  of  each  pair  shows  the 
parameter  in  another  order.   The  variables  are  sorted  in  the 
increasing  order  of  value.   The  sorted  plot  helps  to  identify 
regions  where  sufficient  observations  are  not  available.   It 
also  helps  to  identify  the  variables  that  are  categorical  in 
nature.   Plots  prepared  to  display  continuous  variables, 
categorical  variables,  and  logical  variables  are  shown  in 
Figures  3.1,  3.2  and  3.3,  respectively. 


28 


Number  of  Crashes 


Number  of  Crashes:  Sorted 


20 
10 
0 


;••'■.,;., 

'"      I 

2000 

1500 

1000 

500 

0 


Section  Length 


Section  Length:  Sorted 


soo 


iotjo 


1000 


Number  of  Intersections 


Number  of  Intersections:  Sorted 


500 


1000 


500 


1000 


40000 

30000 

»  20000 

10000 

0 


Traffic  Volume 


■e  •:■,.' 


Traffic  Volume:  Sorted 


1000 


500 


2000 


FIGURE  3.1  Unsorted  &  Sorted  Plots  of  Continuous  Variables 
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Traffic  Speed 


Traffic  Speed:  Sorted 


SQO        1000        1800 


500         1000         1500        2000 


Lane  Width 


Lane  Width:  Sorted 


15 
14 
13 
12 
11 
10 


600         1000        1500 


SOO         1000        1SO0        2000 


Paved  Shoulder 


Paved  Shoulder:  Sorted 


SOO  1000  1S00 


500  1000  1500  2000 


Unpaved  Shoulder 


Unpaved  Shoulder:  Sorted 


12 
10 
8 
6 
4 
2 
0 


SOO        1000        1600 


600         1000        1500        2000 


FIGURE  3.2  Unsorted  &  Sorted  Plots  of  Categorical  Variables 
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FIGURE  3.3  Unsorted  &  Sorted  Plots  of  Logical  Variables 


The  observations  at  extreme  values  in  the  sorted  plots 
that  are  detached  from  the  other  observations  show  extreme 
values  when  compared  to  most  of  the  other  observations. 
These  observations  will  be  considered  carefully  during 
analysis.   If  inconsistencies  are  observed  in  the  models  at 
any  stage  due  to  these  observations,  all  attempts  will  be 
made  to  find  measures  of  rectifying  such  problems.   If  no 
means  are  available  to  rectify  such  situations,  these  points 
will  be  eliminated  and  the  behavior  of  the  corresponding 
variable  at  such  values  will  be  considered  as  unpredictable. 


CHAPTER  4 
CRASH  DISTRIBUTION 


Crash  Classification 
The  total  number  of  crashes  that  occur  in  a  highway 
section  is  generally  classified  based  on  severity  and  type. 
The  crash  classification  done  based  on  crash  severity  and  the 
code  used  to  represent  them  are  given  in  the  following 
listing. 

#  Crash   Severity  Code 

1  Property  Damage  PDO 

2  Injury  Crashes  Inj 

3  Fatal  Crashes  Fat 

4  All  Crashes  Ace 

Crash  Frequency  and  Crash  Rate 
Crash  frequency  is  the  total  number  of  crashes  that 
occur  at  a  highway  section  during  a  given  period  of  time, 
regardless  of  length  of  the  section,  AADT,  or  duration  of 
observation.   The  period  of  observation  is  usually  taken  as 
one  year  and  the  length  of  section  is  generally  limited  to 
one  mile.   Crash  rate  is  a  function  of  crash  frequency, 
section  length  and  the  average  annual  daily  traffic.   For  any 
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given  highway  section,  crash  rate  is  defined  as  the  number  of 

crashes  per  one  million  vehicle  miles. 

Crash  rate  =  f  (crash  frequency,  section  length,  AADT) 

Crash  rate  is  generally  used  in  regression  analysis  for 
developing  crash  prediction  models.   When  crash  rate  is 
defined  as  the  response  variable,  it  is  assumed  that  the 
probability  for  a  crash  to  occur  does  not  depend  on  the 
traffic  volume  or  on  the  uniformity  of  design.   When  number 
of  crashes  per  section  is  considered  as  the  dependent 
variable  and  when  section  length  and  AADT  are  treated  as 
independent  variables,  the  effect  of  traffic  volume  and 
uniformity  in  highway  design  on  the  number  of  crashes  are 
also  taken  into  consideration  while  modeling. 

In  this  study,  crash  frequency  is  considered  as  the 
response  variable  regardless  of  the  section  length  or  the 
AADT.   Section  length  and  AADT  are  treated  as  independent 
variables  that  can  influence  the  occurrence  of  number  of 
crashes  in  a  given  highway  section. 

Frequency  Distribution 
The  highway  section  can  be  grouped  into  classes  based 
on  the  number  of  crashes  that  occur  in  each  section.   A 
frequency  distribution  table  can  be  constructed  using  the 
counts  in  each  class  or  scores  in  each  interval.   The  shape 
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of  the  frequency  distribution  can  be  seen  from  the  plot  of 
crash  frequency  against  each  class.  The  resulting  plot  is 
known  as  a  histogram. 

Figure  4 . 1  shows  the  histogram  of  all  crashes  in  two- 
lane  urban  highway  sections  in  the  State  of  Florida.   The 
plot  on  the  left  side  shows  a  scatter  plot  obtained  by 
plotting  crash  frequency  of  each  highway  section.   The  plot 
on  the  right-side  shows  the  histogram  prepared  from  the 
crash  distribution  table. 

According  to  the  histogram,  the  crash  frequency 
distribution  is  single  sided  with  a  large  number  of  highway 
sections  with  no  crashes.   The  highway  sections 
corresponding  to  higher  number  of  crashes  per  section  seem 
to  decrease.   Beyond  fifteen  crashes  per  section,  the  number 
of  highway  sections  get  close  to  zero  while  the  curve  tends 
to  become  asymptotic  to  the  x-axis. 
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FIGURE  4.1  Distribution  of  Actual  Crash  Data 
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Three  important  observations  from  the  histogram  plot  are 
listed  below. 

1.  There  are  more  than  600  highway  sections  that  have  zero 
crashes . 

2.  About  50  highway  sections  have  more  than  15  crashes. 

3.  The  distribution  is  single  sided. 

Poisson  Distribution 

The  Poisson  distribution  is  widely  used  to  model  count 
data.   Rarely  occurring  events  are  generally  represented 
using  Poisson  distribution.   The  shape  of  the  distribution 
depends  only  on  one  parameter,  the  mean  value  of  the  data. 
In  other  words,  the  mean  determines  the  shape  of  the 
distribution. 

Poisson  distribution  has  generally  been  accepted  as  the 
standard  distribution  function  to  represent  the  crash 
frequencies.   The  Poisson  distribution  models  the  probability 
of  y  '^events'  or  incidents  according  to  the  Poisson  process 
with  the  probability  given  by  the  following  expression. 

p(y,|i)  =  e-^'  p.^  /y! 

Where, 

y  =  0,  1,  2,  3,  ... 

jj,  =  mean  value  of  the  sample 
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The  variance  of  the  distribution  is  assumed  to  be  equal  to 
the  mean  of  the  distribution. 


Negative  Binomial  Distribution 

The  negative  binomial  distribution  is  similar  to  the  ! 

t 

Poisson  distribution  but  unlike  the  Poisson  distribution,  it 

allows  the  variance  to  be  much  larger  than  the  mean.   The  1 

mean  and  variance  of  the  negative  binomial  model  can  be  I 

[ 
written  as  follows.  ;■ 

i 
E  (Y/x)  =  ^i(x)  ! 

V  (Y/x)  =  ^(x)  +  a  \i{x)^  I 

Where, 

a  is  referred  to  as  the  dispersion  parameter. 

Rejection  of  Poisson  Distribution 
The  assumption  that  crash  frequency  is  distributed 
according  to  a  Poisson  distribution  is  rejected  based  on  the 
results  from  three  experiments. 

Violation   of  Mean-Variance  Equality: 

From  the  observed  values  of  crashes,  the  mean,  standard 
deviation,  and  variance  are  calculated.  The  values  obtained 
are  listed  below. 
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#  Parameter  Value 

1  Range  0-60 

2  Mean  3.2 

3  Standard  Deviation         6.1 

4  Variance  37.2  I 

i 

Poisson  Distribution  assumes  that  the  variance  is  equal  ! 

t 

\ 

to  the  mean.   The  mean  value  of  crash  frequency  is  3.2,  while         I 


the  variance  is  37.2  (  >>  mean  ) .   Therefore  the  basic 
assumption  used  to  develop  the  Poisson  model  is  violated. 

Over-dispersion   Coefficient   exceeds   1: 

A  test  for  over-dispersion  was  performed  using  the 
outputs  from  the  procedure  that  estimates  the  negative 
binomial  regression  in  the  statistical  analysis  package, 
LIMDEP  by  Green  [2].   If  the  over-dispersion  factor  exceeds 
1,  the  distribution  is  assumed  to  be  negative  binomial.   The 
over-dispersion  factor  estimated  by  the  regression  analysis 
was  1.49. 

Disagreement   in   Shape   of  Distribution: 

The  mean  number  of  crashes  for  two  lane  highways  is 
calculated  from  the  crash  data.   All  statistics  obtained  from 
the  actual  crash  frequencies  may  be  used  to  generate 
theoretical  frequencies  to  follow  any  assumed  distribution. 
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A  vector  of  length  "n"  is  generated  randomly  using  a 
Poisson  distributed  random  number  generator.   The  number  of 
elements,  (n)  is  made  equal  to  the  number  of  highway 
sections.   The  value  of  parameters  used  to  drive  the  random 
number  generator  is  obtained  from  the  actual  crash  data.   The 
distribution  of  the  resulting  vector  is  expected  to  look  like 
the  distribution  of  actual  crash  data. 

To  compare  this  theoretical  crash  data  with  the  actual 
crash  data,  a  scatter  plot  and  a  histogram  plot  are  prepared 
using  the  randomly  generated  crash  data.   The  plots  thus 
obtained  are  shown  in  Figure  4.2. 
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FIGURE  4.2  Distribution  of  Poisson  Data 
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Three  important  observations  from  the  histogram  plot  are 
listed  below. 

1.  The  number  of  highway  sections  that  had  no  crashes 
during  the  observation  period  is  less  than  100. 

2.  There  are  no  highway  sections  with  crash  frequency 
greater  than  10. 

3.  The  distribution  is  double  sided  with  short  tails. 

None  of  these  observations  agree  with  the  observations 
made  from  the  actual  frequency.   Similar  procedure  is  used  to 
generate  another  vector  of  random  numbers  that  follow  a 
negative  binomial  distribution  based  on  the  actual  crash 
statistics.   The  scatter  plot  and  histogram  plot  are  shown  in 
Figure  4.3. 
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FIGURE  4.3  Distribution  of  Negative  Binomial  Data 
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Three  important  observations  from  the  histogram  plot  are 
listed  below. 

1.  About  600  highway  sections  had  zero  crashes. 

2.  About  50  highway  sections  experienced  more  than  15 
crashes . 

3.  The  distribution  is  single  sided. 

Conclusion 

All  these  observations  agree  with  the  observations  made 

from  the  actual  crash  distribution.   Based  on  the  results 

obtained  from  all  the  three  experiments  described  in  the 

previous  sections,  it  can  be  concluded  that  the  crash 

distribution  of  total  crashes  that  occur  at  two-lane  urban  i 

f 

highways  follow  negative  binomial  distribution.  i 

The  distribution  of  PDO  crashes,  injury  crashes  and  ;. 

fatal  crashes  are  also  checked  using  the  same  procedure.   The 

following  results  were  obtained.  i 

[ 

i 

[ 

#     Crash    Type  Distribution  [ 

1  PDO  crashes     Negative  Binomial  Distribution  I 

i 

2  Injury  crashes  Negative  Binomial  Distribution  [ 

3  Fatal  Crashes   Poisson  Distribution 

The  fatal  crashes  were  of  very  rare  occurrence  and  there  were 

no  signs  of  over-dispersion.  : 


CHAPTER  5 
EXPLANATORY  VARIABLES 


This  chapter  gives  an  introduction  to  all  the  variables 
that  are  believed  to  contribute  to  the  occurrence  of  crashes. 
Such  variables  that  may  be  able  to  explain  the  occurrence  of 
crashes  are  termed  explanatory  variables.   The  explanatory 
variables  are  classified  into  longitudinal  factors, 
operational  factors  and  cross  sectional  factors. 

•  Longitudinal  factors  include  section  length,  number  of 
intersections,  level  crossings  and  narrow  bridges. 

•  Operational  factors  include  traffic  volume,  speed  limit 
and  on-street  parking  conditions. 

•  Cross  sectional  factors  include  lane  width,  shoulder 
width,  paved  shoulder  width  and  unpaved  shoulder  width. 

The  Highway  Section 
A  highway  section  is  defined  as  a  uniform  stretch  of 
roadway  for  which  the  operational  factors  and  cross  sectional 
factors  remain  unchanged.   The  length  of  a  highway  sections 
usually  ranges  between  0.5  and  2.5  miles.   Since 
intersections  are  not  considered  as  constraints  in 
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determining  highway  section  boundaries,  a  highway  section  may 
consist  of  several  intersections. 

Changes  in  geometry,  speed  limits,  parking  regulations 
or  traffic  volumes  results  in  a  highway  section  getting 
categorized  into  several  smaller  highway  sections.   Therefore 
a  longer  highway  section  implies  design  consistency  while 
several  short  highway  sections  imply  irregularities  in 
design. 

There  is  a  possibility  for  irregularities  in  design  to 
contribute  to  crashes.   Therefore  each  highway  section  is 
considered  as  one  observation  in  this  study  rather  than 
considering  sections  of  one  mile  length.   The  section  length 
will  be  considered  as  one  of  the  explanatory  variables. 

Longitudinal  Parameters 

The  section  length  is  the  most  important  longitudinal 
parameter  of  the  highway  section.   Other  factors  include 
number  of  intersections,  number  of  railway  crossings  and 
number  of  narrow  bridges. 

The  crashes  that  occur  at  an  intersection  depend  more 
on  the  design  aspects  and  operational  features  of  the 
intersection  than  on  the  features  of  the  section.   Since  this 
study  is  focused  towards  modeling  the  crashes  as  function  of 
the  highway  features,  the  crashes  that  occur  at  the 
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intersections  are  not  included  as  part  of  the  response 
variable . 

Even  though  the  crashes  that  occur  at  intersections  are 
excluded,  there  is  a  possibility  for  mid-block  crash 
frequency  to  be  influenced  by  the  presence  of  intersections. 
Therefore  'number  of  intersections'  is  also  considered  as  one 
of  the  explanatory  variables  in  this  analysis. 

Figure  5.1  shows  the  pairwise  plot  of  all  longitudinal 
parameters  with  crash  frequency  as  the  first  variable.   A 
pairwise  plot  is  prepared  by  plotting  all  variables  on  a  two- 
dimensional  surface.   Each  plot  represents  two  variables. 
The  plots  give  a  general  idea  of  how  well  the  variables  are 
related  to  each  other.   The  response  variable  is  shown  as  the 
first  variable  in  each  plot. 

The  plots  may  be  explained  using  the  following 
examples.   In  Figure  5.1,  the  plot  corresponding  to  l'^'^  row 
and  3"^"^  column  was  prepared  by  plotting  number  of 
intersections  on  the  x-axis  and  crash  frequency  on  the  y- 
axis.   The  plot  corresponding  to  1^*^  column  and  2"^^  row  was 
prepared  by  plotting  crash  frequency  on  the  x-axis  and 
section  length  on  the  y-axis. 

The  points  in  the  pairwise  plot  are  not  completely 
random.  Therefore,  some  relationship  could  be  expected 
between  the  variables.   Since  the  points  are  also  spread,  out, 
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it  can  be  expected  that  the  behavior  of  the  variables  under 
consideration  is  also  influenced  by  other  variables. 
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FIGURE  5.1  Pairwise  Plot  of  Longitudinal  Factors 
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Operational  Parameters 
Traffic  volume,  traffic  speed  and  on-street  parking  are 
the  important  highway  operational  parameters.  Figure  5.2 
shows  the  pairwise  plot  of  all  operational   parameters  with 
crash  frequency  as  the  first  variable. 
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FIGURE  5.2  Pairwise  Plot  of  Operational  Factors 
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Since  traffic  volume  changes  with  time  it  is  difficult 
to  measure  the  volume  and  record  it  on  an  ongoing  basis. 
Besides,  it  is  practically  impossible  to  know  the  traffic 
volume  or  density  at  the  time  of  the  accident.   Therefore  a 
representative  variable,  average  annual  daily  traffic  (AADT) 
is  used  as  the  variable  to  represent  traffic  volume. 

Similarly  traffic  speed  also  changes  with  time. 
Therefore  another  indicated  variable,  speed  limit,  is  used  to 
represent  this  factor.   Speed  limit  is  a  function  of  several 
geometric  parameters,  pavement  conditions  and  sight  distance. 

Speed  limit  when  defined  as  an  explanatory  variable 
represents  the  effect  of  these  factors  on  highway  safety. 

On-street-parking  is  another  important  operational 
parameter.  It  is  represented  using  a  logical  variable  that 
takes  the  value  zero  if  on-street  parking  is  prohibited  and  a 
value  one  if  on-street  parking  is  permitted  for  that  highway 
section. 

Cross  Sectional  Parameters 
The  cross  sectional  factors  are  lane  width,  shoulder 
width,  median  width,  and  safe  recovery  area  width.   Since  the 
type  of  highway  considered  for  this  study  is  undivided,  the 
median  width  is  zero.   The  safe  recovery  area  is  usually  zero 
for  urban  highways.   The  shoulder  could  either  be  paved. 
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unpaved,  or  a  combination  of  both.   Since  a  paved  shoulder 
can  functionally  contribute  to  the  width  of  lane,  paved  and 
unpaved  shoulders  are  considered  as  two  different  parameters. 
Figure  5.3  shows  the  pairwise  plot  of  all  cross  sectional 
parameters  with  crash  frequency  as  the  first  variable. 
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FIGURE  5.3  Pairwise  Plot  of  Cross  Sectional  Parameters 


CHAPTER  6 
MODELING  STRATEGIES  &  THE  BASE  MODEL 


This  chapter  gives  an  introduction  to  the  statistical  1 

methods  used  in  the  study.  Criteria  used  for  accepting  or  [ 

rejecting  models  and  for  preferring  one  model  over  other  i 

models  are  also  discussed  briefly  in  this  chapter.  [ 

Assumptions  made  based  on  the  insights  obtained  from 
literature  review,  visualization  of  actual  data  and  based  on 
known  statistical  concepts  are  used  to  develop  the  base 
model.   This  model  and  all  relevant  model  parameters  are 
displayed  and  discussed  briefly  in  this  chapter. 

There  is  a  need  to  validate  or  reject  these  assumptions 
based  on  statistical  inference.  The  next  chapter  deals  with 
improving  this  model  step  by  step  while  all  assumptions  used 
in  the  base  model  are  evaluated  in  stages. 

Generalized  Linear  Models 
In  ordinary  linear  regression  analysis,  the  errors  are 
assumed  to  be  distributed  normally.   Therefore  the 
properties  of  least  squares  estimates  are  stronger  when  the 
errors  actually  follow  normal  distribution  than  when  they 
are  not  normal.   Most  of  the  time  the  errors  are  not 
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normally  distributed.   In  cases  where  response  variables  are 
of  rare  occurrence,  the  errors  are  seldom  normal.   In  such 
situations,  the  models  developed  using  linear  models  become 
highly  unreliable  even  though  a  very  good  fit  can  be 
attained  through  sophisticated  modeling. 

The  generalized  linear  models  (GLM)  introduced  by 
Nelder  and  Wedderburn  (1972)  is  a  generalized  approach  to 
linear  models  in  which  a  wide  range  of  different  types  of 
error  distribution  families  is  accommodated.  Generalized 
Linear  Models  are  specified  by  three  components,  random, 
systematic  link. 

The  random  component  identifies  the  probability 
distribution  of  the  response  variable.   Since  crash 
frequency  represents  counts  it  is  discrete  in  nature  and 
follows  a  distribution  pattern.   The  systematic  component 
specifies  a  linear  function  of  explanatory  variables  that  is 
used  as  a  predictor.   Section  length,   traffic  volume,  speed 
limit,  lane  width  and  shoulder  width  are  examples  of 
explanatory  variables  that  can  be  expressed  in  the  linear 
form  as  given  below. 

systematic  component  =  PO  +  [31x1  +  [32x2  +  [33x3  +  

where, 

xl,  x2,  x3  are  the  independent  variables  or  functions  of 

independent  variables. 
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PO  is  the  y-intercept  and 

pi,  P2,  P3  are  coefficients  of  the  independent  variables 

The  link  component  of  the  Generalized  Linear  Model 
links  expected  values  of  observations  to  explanatory 
variables  through  a  specified  function.   In  Poisson  and 
negative  binomial  distributions,  the  link  function  could  be 
natural  log. 

Model  Statistics 
The  definition  of  some  important  terms  used  to  express 
the  model  characteristics  and  reliability  are  given  in  the 
following  sections. 

Regression    Coefficient    (P)  : 

I  The  parameters  Po  and  pi Pn  are  called  regression 

coefficients.  Po  is  the  y-intercept  of  the  regression  model 

i  and  pi...  pn  represents  change  in  expected  value  of  the 

I 

J  response  variable  per  unit  change  m  each  independent 

1 

variable . 

Sum   of  Squares    (SS)  : 

Sum  of  sguares  of  the  model  is  a  measure  of  the 
variability  in  the  response  variable  that  has  been  explained 
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by  the  model.   Sum  of  squares  of  individual  parameters 
represents  the  portion  of  model  sum  of  squares  that  has  been 
contributed  by  that  parameter. 

F-test; 

The  value  of  P  depends  on  the  units  used  to  represent 
the  corresponding  parameter.   For  example,  the  P  obtained 
when  expressing  section  length  in  miles  will  be  one  thousand 
times  the  value  of  P  obtained  when  section  length  is 
expressed  in  one-thousandth  of  a  mile.   The  magnitude  of  p 
itself  is  not  a  clear  indication  of  its  significance. 

The  F-test  can  be  used  to  find  the  relative  importance 
of  one  term  with  respect  to  another.   If  a  significant 
amount  of  extra  variance  can  be  eliminated  (explained)  by 
including  the  term,  its  presence  is  justified. 

t-test : 

The  t-test  is  similar  to  that  of  F-test  except  for  the 
fact  that  the  t-test  takes  the  direction  of  the  coefficient 
into  consideration.   The  t-value  may  be  written  as  follows, 
t  =  pj  /  sV(cjj) 
where, 
Pj  is  the  coefficient  of  j^"^  term. 


51 

s  is  the  standard  error  of  bj  and 

cjj  is  the  j^"^  diagonal  element  of  the  (X'X)"-^  matrix. 

p-value : 

The  p-value  measures  the  level  at  which  the  t-statistic 
is  significant.   A  p-value  of  .10  suggests  that  the 
parameter  it  represents  is  significant  at  a  confidence  level         [- 
of  90%.   The  generally  accepted  significance  level  is  95% 
which  corresponds  to  a  p-value  of  ,05. 

Likelihood  function: 

The  likelihood  function  of  a  given  data  n,  is  the 
probability  of  n  for  that  sampling  model,  treated  as  a 
function  of  the  unknown  parameters  [37,  page  40] .   The 
maximum  likelihood  (ML)  estimates  are  parameter  values  under 
which  the  observed  data  would  have  had  the  highest 
probability  of  occurrence. 

Deviance : 

The  deviance  of  an  ordinary  least  squares  model  is  a 
function  of  its  log-likelihood  and  the  log-likelihood  of  the 
corresponding  saturated  model.    It  is  calculated  by  finding 
the  difference  in  log-likelihood  and  multiplying  it  by  2. 
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The  deviance  of  a  generalized  linear  model  is  similar 
to  the   residual  sum  of  squares  of  a  linear  model.    It  is 
the  weighted  residual  sum  of  squares  of  the  model.   The 
residual  degrees  of  freedom  is  used  to  calibrate  the 
deviance . 

Variable  Selection  Procedure 
The  value  of  AIC  can  be  used  as  the  criteria  to  prefer 
one  model  over  another  model.   AIC  is  short  for  Akaike 
Information  Criteria.   For  generalized  models,  AIC  is 
defined  as  a  function  of  deviance  (D) ,  degrees  of  freedom 
(p) ,  and  an  estimate  of  the  dispersion  parameter  (0) . 

AIC     =    D   +   2pd 

A  decrease  in  the  value  of  any  of  the  three  parameters, 

results  in  a  reduction  in  AIC  value.   In  all  model  selection 

routines,  AIC  is  used  as  the  criteria  for  ranking  candidate 

models  from  which  the  model  corresponding  to  minimum  AIC 

values  is  accepted.   It  is  similar  to  the  Mallows'  Cp  j 

i 
criteria  which  penalizes  the  use  of  more  number  of  | 

regressors  to  attain  the  expected  quality  of  fit.  i 

i 
Sequential    Variable   Selection:  [ 

Stepwise  variable  selection  is  a  search  routine  that  |- 

i 

assists  in  finding  a  subset  of  explanatory  variables  that  i 
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could  be  included  in  a  multiple  regression  model.    According 
to  this  concept,   variables  can  get  added  or  deleted  from  the 
existing  model  on  the  basis  of  a  predefined  criteria  which 
measures  the  relative  improvement  of  the  model  with  respect 
to  each  variable. 

For  a  given  number  of  variables,  the  number  of  models 
that  can  be  generated  considering  all  possible  combinations 
of  all  or  part  of  the  variables  is  very  large.   An  exhaustive 
search  will  result  in  examining  a  large  number  of  models. 
Stepwise  variable  selection  is  a  technique  used  to  reduce  the 
number  of  models  that  need  to  be  examined  without  taking  any 
risk  of  missing  the  best  combination  of  variables. 

To  eliminate  the  probability  of  losing  effective 
combinations  of  variables,  certain  strategies  are  adopted  in 
identifying  a  path  which  would  lead  to  the  best  model.   The 

three  general  sequential  algorithms  are  discussed  briefly  in  i 

i 

the  following  sections.  I 

I 

I 
Forward  Selection:  \ 

In  forward  selection,  the  initial  model  contains  only  |- 

the  constant  term  that  represents  the  y-intercept.   A  set  of  I 

i 
models  is  developed  as  the  second  stage  in  which  each  model  ,■ 

contains  exactly  one  term  other  than  the  intercept.   The  ! 

model  with  lowest  value  of  AIC  is  selected  and  this  model  ' 
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forms  the  base  to  find  the  next  regressor.   The  process  stops 
when  adding  another  regressor  is  not  capable  of  bringing  down 
the  AIC  value  any  further.   In  this  method,  a  regressor  once 
selected  is  never  considered  for  elimination.   The  parameters 
in  the  final  model  depend  on  the  order  in  which  variables 
enter  the  model. 

Backward  Selection : 

In  backward  selection,  the  first  model  is  developed 
using  all  the  regressors.    A  series  of  analysis  follow  to 
identify  the  regressor  that  has  the  highest  contribution  to 
the  AIC  value.  The  regressor  thus  identified  is  eliminated 
from  the  current  model  and  the  procedure  is  repeated  to  find 
the  next  regressor.  This  procedure  is  stopped  when 
eliminating  another  term  cannot  reduce  the  AIC  value  any 
further. 

In  this  method,  a  regressor  once  rejected  will  not  be 
reconsidered  for  getting  acceptance  in  the  model.   Therefore 
the  final  model  selected  by  this  process  depends  on  the  order 
in  which  parameters  get  rejected. 

Stepwise  Regression : 

Stepwise  regression  is  a  modification  of  the  forward 
selection.  In  each  stage  of  selection,  all  regressors 


55 

currently  in  the  model  are  further  evaluated  to  justify  its 
existence  in  the  presence  of  the  new  variable  that  was  added. 

Therefore  a  regressor  that  entered  the  model  at  one  stage 
may  be  eliminated  at  another  stage.   The  procedure  is 
terminated  when  no  additional  regressors  can  bring  about  an 
improvement  in  the  AIC  value  either  by  leaving  the  model  or 
by  entering  the  model.   Stepwise  regression  methodology  is 
used  in  the  analysis. 

Stepwise  model  selection  procedure  is  the  generally 
preferred  methodology  for  generalized  linear  models.    The 
procedure  starts  with  an  arbitrary  model  that  has  been  fit 
previously.  The  initial  model  is  improved  in  stages  by 
adding  terms  to  or  deleting  terms  from  the  current  model. 
Each  addition  or  deletion  is  justified  by  the  reduction  in 
the  AIC  statistic. 

Model  Performance  Criteria 

At  the  model  building  stage,  the  conditions  under  which 
the  models  should  perform  are  not  known.   The  models 
developed  through  regression  analysis  needs  to  be  cross 
validated  for  reliability  in  application. 
Fitting  Sample   and   Testing  Sample: 

Prior  to  regression  analysis,  the  data  is  split  to  form 
two  data  sets.  The  larger  set  forms  the  fitting  sample  and 
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the  smaller  set  forms  the  testing  sample.   All  appropriate 
candidate  models  are  developed  and  their  coefficients  are 
computed  using  the  fitting  sample.   The  testing  sample  could 
then  be  used  to  estimate  the  performance  of  fitted  models. 

The  following  procedure  is  used  to  obtain  an  unbiased 
split  of  data  in  the  ratio  75:25.   A  random  number  in  the 
range  of  0  -  1  is  generated.   If  the  value  of  the  first 
random  number  generated  is  less  than  or  equal  to  .75,  the 
first  record  is  included  in  the  fitting  sample.  If  the  value 
of  the  random  number  generated  is  greater  than  .75,  the  first 
record  is  included  in  the  validation  sample.  This  process  is 
repeated  for  each  observation. 

Total  number  of  observations  =  1934 

Observations  in  fitting  sample  =  1466 

Observations  in  testing  sample  =4  68 

All  acceptable  models  developed  at  each  stage  of 
analysis  using  the  fitting  sample  can  be  compared  or  ranked 
based  on  the  quality  of  prediction  on  the  testing  sample. 
The  performance  of  two  or  more  models  can  be  compared  by  the 
relative  accuracy  of  prediction.   Two  norms  are  developed  to 
automate  computations  and  to  develop  a  performance  table. 
The  procedure  used  to  develop  the  norms  are  shown  in  the 
following  sections. 
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Norm   including  all    observations : 

The  prediction  errors  from  the  estimated  response  for 
each  model,  can  be  used  to  generate  norms  which  could  form 
the  criteria  for  accepting  /rejecting  /ranking  the  candidate 
models.  Mean  Absolute  Deviation  in  prediction  can  be  used  as 
the  criteria  for  comparing  the  relative  performance  of  any 
two  models. 

MAD  =  ABS (observed  -  predicted  )/  n 
where,  MAD  is  the  mean  absolute  deviation,  and 
n  stands  for  number  of  observations  in  the  fitting  sample 

Norm   excluding  outliers : 

Mean  Absolute  Deviation  could  be  highly  influenced  by  a 
few  observations  which  may  be  outliers  for  a  particular 
model.   To  nullify  the  effect  of  such  observations,  up  to  5% 
of  the  observations  with  worst  prediction  error  are  excluded 
from  the  computation.    The  absolute  values  of  errors  are 
sorted  in  increasing  order  and  the  last  5%  of  observations 
are  rejected  from  the  calculation.   The  norms  are  included 
along  with  other  model  parameters  in  all  performance  tables 
discussed  in  the  next  chapter. 

Basic  Assumptions  and  the  Base  Model 

A  regression  model  is  developed  using  all  regressors  as 
independent  variables  and  crash  frequency  as  the  response 


variable.   The  variable  transformations  used  are  based  on  the 
information  obtained  from  literature  review.   Traffic  volume 
and  section  length  are  assumed  to  follow  natural  log 
transformation.   All  other  variables  are  represented  in  the 
natural  scale.   This  assumption  is  based  on  the  studies  done 
by  a  few  analysts  at  earlier  stages.    These  parameters  are 
further  experimented  to  see  if  any  other  transformation  can 
represent  them  better  than  the  default  functions.   Some 
important  results  of  such  studies  are  also  discussed  in  the 
next  chapter. 

Information  about  variable  interactions  is  not  clearly 
known  at  this  time  and  such  situations  are  assumed  to  be 
nonexistent  at  this  time.   All  predictors  are  assumed  to  be 
continuous  though  some  of  them  show  categorical  nature  which 
will  be  explored  at  a  later  stage.   The  variables  included  in 
the  development  of  this  model  are  listed  below. 


#  Parameter 


Section  Length 

Number  of  intersections 

Ave.  Annual  Daily  Traffic 

Posted  Speed  Limit 

On- Street- Par king 

Lane  Width 

Outside  Paved  Shoulder 

Outside  Unpaved  Shoulder 


Code 

Transformation 

slen 

log 

its 

none 

adt 

log 

spd 

none 

pk 

none 

Iw 

none 

ops 

none 

oups 

none 
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9,  Outside   Curb  oc  none 

10.  Friction   Factor  fr  none 

The  Model    Parameters : 

The   model   parameters    and   standard  model   plots    are   shown 
in   the   following   six   sections. 
I.    The   Model: 

ace   ~    log(slen)    +   its   +   log(adt)    +   spd  +  pk  +   Iw  +   ops   +   oups   +   oc   +    fr 
theta   =    1.51256,    family  =   negative   binomial,    link   =   log 


II.  Model  Coefficients: 

Parameter                  Value             Std  Err  t    value 

■(Intercept)  -10.321376409  0.710073553  -14.53564405 

log(slen)    0.764342626  0.036520748  20.92899676 

its    0.064613833  0.011073166  5.83517269 

log{adt)    0.856658961  0.056969923  15.03703913 

spd   -0.017071908  0.004398058  -3.88169259 

-_  _     pk    0.563012796  0.142831156  3.94180662 

Iw    0.002467123  0.033606293  0.07341253 

ops   -0.070158874  0.015603061  -4.49648148 

oups   -0.021864803  0.012393907  -1.76415740 

oc    0.200213092  0.117361492  1.70595217 

•fr    0.043057361  0.061632129  0.69861875 

III.  F  Statistics: 

Parameter        Df  Sum   of  Sq     Mean    Sq     F  Value  Pr(F) 


log( 

slen) 

1 

785 

511 

785 

5110 

662 

7017 

0 

0000000 

its 

:  1 

74 

929 

74 

92  8  9 

63 

2142 

0 

0000000 

log( 

adt) 

1 

200 

624 

200 

6236 

169 

2575 

0 

0000000 

spd 

1 

39 

317 

39 

3174 

33 

1704 

0 

ooooooo 

pk 

1 
1 

14 
0 

772 
042 

14 
0 

7720 
0417 

12 
0 

4625 

0 

0004281 

Iw 

0351 

0 

8513137 

ops 

1 

16 

273 

16 

2726 

13 

7285 

0 

0002191 

oups 

1 

6 

407 

6 

4074 

5 

4056 

0 

0202091 

oc 

1 

2 

694 

2 

6937 

2 

2725 

0 

1319030 

NULL 

1465 

log (slen) 

1 

1139 

127 

1464 

its 

1 

72 

707 

1463 

log(adt) 

1 

217 

330 

1462 

spd 

1 

42 

244 

1461 

Pk 

1 

15 

901 

1460 

Iw 

1 

0 

003 

1459 

ops 

1 

18 

196 

1458 

oups 

1 

6 

506 

1457 

DC 

1 

2 

477 

1456 

fr 

1 

0 

498 

1455 
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■fr  ■     ::::.::ii:  '^L   ="0.488    0.4881    0.4118  0.5211775   :"»••■■-  ---   :-.::i-::-::,--^ 
Residuals  1455   1724.635    1.1853 

IV.  Analysis  of  Deviance  Table: 

Variable   Df  Deviance  Resid.    Df  Resid.    Dev        Pr (Chi) 

3105.902 

1966.775  0.0000000 
1894.068  0.0000000 
1676.738  0.0000000 
1634.493  0.0000000 
1618.593  0.0000668 
1618.590  0.9575715 
1600.394  0.0000199 
1593.888  0.0107488 
1591.410  0.1154974 
1590.912  0.4803596 

V.  Model  Statistics: 

Null  Deviance:  3105.902  on  14  65  degrees  of  freedom 

Residual  Deviance:  1590.912  on  1455  degrees  of  freedom 

Theta:   1.51257,  Standard  Error:   0.10623 

2  X  log-likelihood:   8830.80398 

AIC:  1614.967,   MAD:  2.711362^    ^  .:,-,;  ,.;:i=i  

Crash  frequency  is  modeled  as  a  function  of  section 
length  (slen),  number  of  intersections  in  the  sections 
(its) ,  AADT,  speed  limit  (spd) ,  parking  regulations  (pk) , 
lane  width  (Iw) ,  outside  paved  shoulder  width  (ops),  outside 
unpaved  shoulder  width  (oups) ,  presence  of  outside  curb  (oc) 
and  the  coefficient  of  friction  of  the  pavement  surface. 
The  distribution  assumed  is  negative  binomial,  and  the  link 
function  is  natural  log. 


VI ;  Model:  Plots: 
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FIGURE  6.1  standard  Plots  of  the  Base  Model, 


The  variable  coefficients  (P's),  standard  error  of  each 
term  and  t-statistics  are  shown  in  section  II.  Section  III 
shows  the  Sum  of  Squares  imparted  by  each  variable  and  the 
associated  F-statistics .   The  p-values  of  most  of  the  terms 
are  less  than  .05  that  shows  significance  at  a  confidence 
level  greater  than  95%.   The  analysis  of  deviance  is  shown 
in  section  IV,  all  important  model  statistics  are  shown  in 
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section  V  and  the  standard  model  plots  are  given  in  section 
VI. 

Lane  width,  a  very  important  parameter  was  declared 
insignificant  by  the  criteria  used  in  eliminating  terms 
during  stepwise  regression.   The  coefficient  of  speed  limit 
suggests  that  as  speed  limit  increases  crash  frequency 
decreases.   This  model  is  studied  in  detail  and  various 
stages  of  improvement  that  it  goes  through  before  reaching 
the  final  model  are  discussed  in  the  next  chapter. 


CHAPTER  7 
STATISTICAL  MODELING 

[ 
i 

This  chapter  deals  with  various  stages  of  regression  • 

'i 
analysis  and  related  issues.   The  objective  of  this  chapter 

is  to  identify  the  best  way  of  representing  each  variable  in  i 

I 
I 

the  crash  prediction  model  while  validating  basic  I 

assumptions.   Assumptions  made  about  variable  transformation         i 

t 
and  interactions  are  reviewed  in  this  chapter.  [ 


The  analysis  is  started  from  the  base  model  presented 
in  the  previous  chapter.   Each  variable  in  the  base  model  is 
examined  individually  and  compared  with  all  possible  and 
reasonably  explainable  form  of  representation  in  the  model.  A 
model  in  which  any  form  of  the  variable  under  consideration 
is  not  significant  at  the  95%  confidence  level  is  rejected. 

The  models  that  survive  this  test  are  compared  at 
stages  to  find  the  best  model  for  each  stage.   While 
examining  the  transformation  function  of  one  variable,  it  is 
assumed  that  the  other  variables  are  represented  correctly. 
Since  this  assumption  can  affect  the  outcome  of  the  first  few 
variables  that  are  analyzed,  the  final  models  are  subjected 
to  cross  checking  for  confirmation.   This  error  is  minimum 
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when  the  order  used  for  analyzing  variables  are  based  their 
importance  in  the  model. 

The  independent  variables  and  the  sum  of  square  values 
as  represented  in  the  base  model  are  listed  below.   The  sum 
of  squares  value  of  each  variable  in  a  model  is  a  measure  of 
its  relative  contribution  with  respect  to  other  variables  in 
explaining  the  response  variable.   The  base  model  shows  that 
the  section  length  (sum  of  squares  =  785.5)  contributes  more 
than  double  that  of  all  other  variables  put  together.   If 
section  length  is  considered  as  the  first  variable  for 
analysis,  the  error  induced  by  incorrect  representation  of 
other  variables  can  be  minimized. 


Parameter  Listing:  The  Base  Model 

Parameter  Code  Sum   of  Sq     Mean    Sq     F  Value             Pr (F) 

Section  Length  log(slen)  785.511  785.5110  662.7017  0.0000000 

Intersections  its  74.929   74.9289   63.2142  0.0000000 

Traffic  Volume  log(adt)  200.624  200.6236  169.2575  0.0000000 

Speed  Limit  spd  39.317   39.3174   33.1704  0.0000000 

Parking  pk  14.772   14.7720   12.4625  0.0004281 

Lane  Width  Iw  0.042    0.0417    0.0351  0.8513137 

Paved  Shoulder  ops  16.273   16.2726   13.7285  0.0002191 

Unpaved  Shldr  oups  6.407    6.4074    5.4056  0.0202091 

Raised  Curb  oc  2.694    2.6937    2.2725  0.1319030 

Friction  Factor  fr  0.488    0.4881    0.4118  0.5211776 
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Representing  Longitudinal  Factors 
In  the  base  model,  section  length  was  assumed  to  have  a 
natural  log  transformation.   Some  of  the  functions  generally 
used  to  transform  continuous  variables  in  statistical 
modeling  are  natural  log,  square  root  and  square. 

A  series  of  models  are  developed  from  the  base  model 
utilizing  these  transformation  functions  to  represent  section 
length  and  intersections.   The  models  that  survived  the 
confidence  level  test  and  stepwise  regression  are  listed 
below  and  the  corresponding  model  parameters  are  shown  in  the 
table  that  follows. 


#  Model 

1.  SLl 

2.  SL2 

3.  SL3 

4.  SL4 


Section   Length 

log (slen) 

slen 

sqrt (slen) 

slen'^2 


Rejected  by  stepwise 

Iw,  f  r 

Iw,  fr,  oups 

Iw,  f  r 

Iw,  f  r 


TABLE  7 . 1  Models  for  representing  Section  Length 

Model  Parameter 

SLl 

SL2 

SL3 

SL4 

Null  Deviance 

3104.35 

2762.75 

3075.06 

2374.54 

Residual  Deviance 

1590.80 

1559.78 

1578.20 

1548.76 

Theta 

1.511 

1.23 

1.485 

.9521 

Standard  Error 

.106 

.078 

.103 

.0554 

2*log  likelihood 

8830.28 

8710.03 

8830.74 

8517.54 

AIC 

1610.45 

1576.89 

1597.70 

1563.62 

Prediction  Error 

2.71 

2.677 

2.691 

2.691 

Error  on  95%  data 

1.924 

1.889 

1.907 

1.893 
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■ParaffiSter  "Listing:  The  "MocJiT  SL2         '■'** '*'" 

Parameter  Code  Sum   of  Sq     Mean    Sq     F  Value               Pr(F) 

Section  Length  slen  820.515  820.5146  772.2600  0.00000000 

Intersections  its  60.324   60.3240   56.7763  0.00000000 

Traffic  Volume  log(adt)  204.467  204.4671  192.4424  0.00000000 

Speed  Limit  spd  36.342   36.3424   34.2051  0.00000001 

Parking  pk  10.350   10.3499    9.7412  0.00183721 

Paved  Shoulder  ops  9.857    9.8567    9.2770  0.00236208 

Raised  Curb  oc  5.118    5.1177    4.8167  0.02834245 

Model  SL4  with  square  function  on  section  length  was 
found  to  be  the  best  model  based  on  the  values  of  Null 
Deviance,  Residual  Deviance,  Standard  Error,  log  likelihood, 
and  AIC  values.   Model  SL2,  which  corresponds  to 
representation  of  section  length  without  any  transformation 
was  found  to  be  the  second  best.   Models  SLl  and  SL3,  which 
represents  log  and  square  root  transformation  are  rejected 
since  the  model  statistics  are  inferior  to  SL2  and  SL4. 

Both  untransformed  and  square  transformed  models  are 
further  compared.   Though  the  square  transformation  gave 
better  model  parameters,  its  ability  to  predict  crash 
frequencies  on  the  test  was  found  to  be  inferior  (mean  error 
is  2.691  crash  /section)  to  the  prediction  capability  of  the 
untransformed  model  (mean  error  is  2.677  crashes/section). 
Therefore  the  untransformed  form  of  section  length  is 
preferred  over  the  square  transformed  form.   The  square 
transformation  will  be  further  considered  at  other  stages. 
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Interaction  between   Section   Length   and  Intersections : 

Long  highway  sections  with  large  numbers  of 
intersections  over-predicted  the  response  variable.  As 
section  length  increases,  crash  frequency  increases.  For  a 
given  section  length,  it  is  reasonable  to  expect  the  crash 
frequency  to  increase  as  the  number  of  intersections 
increases.   In  the  mean  time,  longer  sections  may  be  able  to 
accommodate  more  intersections  than  a  shorter  section  within 
a  defined  range  of  safety. 

The  effect  of  intersections  on  crash  frequency  cannot 
be  completely  independent  of  section  length.   A  product  term 
of  intersections  and  section  length  is  introduced  in  the 
model  to  represent  the  combined  effect  of  these  parameters  on 
crash  frequency.   The  coefficient  of  the  product  term  was 
found  to  be  negative  as  expected.   This  term  applies  a 
corrective  measure  against  over  prediction  on  long  sections 
with  a  large  number  of  intersections.   The  p-value  of  the 
product  showed  significance  at  confidence  level  above  95%. 

The  crash  frequencies  predicted  by  the  resulting  model 
show  great  improvement.   The  plots  prepared  for  diagnostic 
study  of  the  model  SL2,  is  shown  in  Figure  7.1.   The 
improvement  in  prediction  for  observations  listed  in  the 
previous  section  are  shown  in  the  following  listing. 
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# 

Index 

section 

# 

inter 

length 

sections 

1 

276 

1310 

19 

2 

641 

1871 

17.5 

3 

644 

1430 

23.5 

4 

1257 

1672 

12.5 

Actual 

Predi 

cted 

Before 

After 

36 

105 

51 

15 

61 

29 

44 

178 

21 

33 

110 

40 

When  compared  with  the  values  predicted  by  the  non- 
interactive  model,  the  errors  for  interactive  model  are 
substantially  lower.  In  the  presence  of  the  interactive  term, 
the  best  way  of  representing  section  length,  intersections 
and  the  product  term  is  not  known.   Therefore  some  more 
models  are  considered  to  find  the  best  form  of  representing 
longitudinal  factors.   The  models  that  were  acceptable  for 
performance  comparison  are  shown  in  the  following  listing. 
The  model  parameters  are  displayed  in  Table  7.2. 

#  Model  Characteristics 

1  LFl  slen,  slen^2 

2  LF2  slenrits 

3  LF3  slen,  slen^2,  slen:its 


TABLE  7.2  Models  representing  L 

ongitudina 

1  Factors 

Model  Parameters 

LFl 

LF2 

LF3 

Null  Deviance 

3060.64 

2949.865 

3063.98 

Residual  Deviance 

1582.36 

1570.81 

1581.54 

Theta 

1.47 

1.379 

1.475 

Standard  Error 

.102 

.092 

.102 

2*log  likelihood 

8820.56 

8784.60 

8822.77 

AIC 

1604.10 

1590.21 

1605.45 

Pred  Error  Norml 

2.689 

2.6735 

2.6875 

Pred  Error  Norm2 

1.9067 

1.889 

1.905 
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Parairietef  'Listing:  The  Moder  LF2 

Parameter  Code  Sum   of  Sq     Mean   Sq     F  Value  Pr(F) 

Section  Length  slen  779.733  779.7332  697.0394  0.00000000 

Intersections  its  59.487   59.4875  53.1786  0.00000000 

Traffic  Volume  log(adt)  205.715  205.7146  183.8977  0.00000000 

Speed  Limit  spd  34.970   34.9698  31.2611  0.00000003 

Parking  pk  11.724   11.7238  10.4805  0.00123358 

Paved  Shoulder  ops  11.628   11.6276  10.3945  0.00129194 

Raised  Curb  oc  5.580    5.5802  4.9884  0.02566882 

Product  slen:its  79.273   79.2732  70.8659  0.00000000 

Among  the  listed  models,  LF2  gave  the  best  results  in 
terms  of  model  statistics  and  prediction  errors.   According 
to  this  model,  the  interactive  term  is  able  to  yield  higher 
quality  than  the  introduction  of  a  square  term  in  the  model. 

Representing  Operational  Factors 
In  all  previous  analyses,  AADT  was  assumed  to  follow 
natural  log  transformation.   This  assumption  was  based  on  the 
finding  from  a  few  recent  studies.  In  this  section,  this 
assumption  is  re-evaluated  by  comparing  the  parameters  of  the 
current  model  with  that  of  several  other  models  obtained  by 
assuming  various  transformation  functions  for  AADT  including 
the  untransformed  form.  All  acceptable  models  that  resulted 
from  this  analysis  are  listed  below.   The  corresponding  model 
parameters  are  shown  in  Table  7.3. 
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# 

AADT 

Characteristics 

1 

ADTl 

log(adt) 

2 

ADT2 

adt 

3 

ADT3 

sqrt (adt) 

4 

ADT4 

adt,  sq(adt) 

TABLE  7.3  Models  for  representing  AADT 

Model  Parameter 

ADTl 

ADT  2 

ADT3 

ADT4 

Null  Deviance 

2949.865 

2918.41 

2945.30 

2941.56 

Residual  Deviance 

1570,81 

1570.65 

1570.60 

1570.86 

Theta 

1.379 

1.3517 

1.3754 

1.3722 

Standard  Error 

.0921 

.0898 

.092 

.0917 

2*log  likelihood 

8784.60 

8770.86 

8782.76 

8780.90 

AIC 

1590.21 

1590.05 

1590.01 

1592.43 

Pred  Error  Norml 

2.6735 

2.6659 

2.667 

2.6703 

Fred  Error  Norm2 

1.889 

1.8831 

1.8880 

1.8883 

Parameter 

Section  Length   slen 

Intersections    its 

Traffic  Volume 

Speed  Limit 

Parking 

Paved  Shoulder 

Raised  Curb 

Product 


Parameter  Listing:  The  Model  AD2 

Sum   of  Sq     Mean    Sq     F  Value  Pr  (F) 

778.889  778.8891  691.0159  0.00000000 

61.090   61.0904   54.1983  0.00000000 

208.089  208.0886  184.6124  0.00000000 

36.750   36.7503   32.6042  0.00000001 

10.422   10.4216 

9.941    9.9407 

3.597    3.5966 

74.114   74.1145 


adt 

spd 

pk 

ops 

oc 

slen: its 


9.2459  0.00240237 

8.8192  0.00302953 

3.1908  0.07426068 

65.7530  0.00000000 


Among  the  four  best  models  that  were  accepted  for 
further  comparison,  ADT2  gave  the  best  results.   ADT2 
represents  the  model  corresponding  to  untransf ormed  form  of 
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AADT.   All  model  parameters  and  both  norms  indicating 
relative  quality  of  prediction  are  superior  for  ADT2  compared 
to  that  of  the  other  three  models.  The  interaction  of  AADT 
with  other  parameters  if  any  will  be  discussed  in  the  latter 
sessions . 

Can   Coefficient   of  Speed  Limit  be  Negative? 

While  examining  the  models  that  were  developed  in  the 
past,  most  of  the  modeling  process  started  with  speed  limit 
as  one  of  the  regressors.   But  the  final  model  did  not 
contain  speed  limit  as  one  of  the  predictors.   None  of  the 
models  presented  in  literature  review  contain  speed  limit. 
When  higher  speed  limit  is  expected  to  result  in  higher  crash 
frequencies,  any  result  contradicting  that  result  looks 
unacceptable  and  can  cause  the  forceful  removal  of  the 
variable  itself  from  the  model. 

As  speed  limit  increases,  can  the  crash  frequency 
decrease?   A  few  models  were  displayed  in  the  previous 
sections.   In  all  these  models,  speed  limit  was  found  as  a 
very  significant  parameter.   But  the  coefficient  of  speed 
limit  in  all  these  models  was  consistently  negative.   When 
the  p-value  of  this  variable  is  close  to  0,  its  importance  in 
the  model  is  undeniable  though  its  credibility  looks 
suspicious . 


73 

Speed  limit  is  not  a  truly  independent  variable.  Higher 
speed  limit  is  associated  with  higher  design  standards. 
Higher  design  standard  is  associated  with  better  physical 
highway  features.   Examples  of  measurable  features  are  wider 
pavement  and  wider  shoulder.   Features,  which  are  difficult 
to  measure  include  better  pavement  conditions,  drainage 
conditions,  sight  distance  and  access  control. 

According  to  these  models,  crash  frequency  decreases  as 
speed  limit  increases.   If  this  assumption  is  completely  true 
then,  most  of  the  efforts  to  increase  safety  should  focus  on 
attaining  higher  highway  design  standards  that  would  call  for 
higher  speed  limits.   Model  SPDl,  given  in  Table  7.4 
represents  the  model  obtained  by  treating  speed  limit  as  a 
continuous  variable  where  the  coefficient  assigned  to  speed 
limit  through  regression  analysis  is  negative. 

Categorical    Treatment   of  Speed  Limit: 

As  discussed  in  the  previous  section,  higher  speed 
limit  might  be  associated  with  higher  safety  and  lower  crash 
frequency.  The  level  to  which  this  concept  can  be  extended  is 
well  understood  by  treating  speed  limit  as  a  categorical 
variable.  The  following  listing  shows  a  method  used  to 
redefine  speed  limit  as  a  categorical  variable. 
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#    spd     spdc   spdO     spdl    spd2   spd3   spd4   spd5   spd6 


1 

2.5 

0 

1 

0 

0 

0 

0 

0 

0 

2 

30 

1 

0 

1 

0 

0 

0 

0 

0 

3 

35 

2 

0 

0 

1 

0 

0 

0 

0 

4 

40 

3 

0 

0 

0 

1 

0 

0 

0 

5 

45 

4 

0 

0 

0 

0 

1 

0 

0 

6 

50 

5 

0 

0 

0 

0 

0 

1 

0 

7 

55 

6 

0 

0 

0 

0 

0 

0 

1 

Model  SPD2  in  Table  7.4  represents  the  model  obtained 
by  giving  categorical  treatment  to  speed  limit.   Though  the 
model  parameters  did  not  improve,  the  same  trend  was 
observed.   As  speed  limit  increased,  crash  frequency 
decreased. 

The  categorical  treatment  of  speed  limit  also  displayed 
some  quadratic  trend.   Therefore  the  square  term  was  added 
to  see  if  the  continuous  term  for  representing  speed  limit 
could  still  be  used  without  losing  the  behavior  pattern  at 
higher  speed  limits.   Model  SPD3  represents  the  model 
resulting  from  including  the  quadratic  term  of  speed  limit. 


#  Model  Speed  Limit  Characteristics 

1  SPDl  spd  speed  limit  continuous 

2  SPD2  spdc  speed  limit  is  categorical 

3  SPD3  spd''2  square  transformed 
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TABLE    7.4    Models    for   representing   Speed   Limit 

Model    Parameter 

SPDl 

SPD2 

SPD3 

Null    Deviance 

2918.412 

2948.45 

2977.26 

Residual    Deviance 

1570.648 

1571.23 

1570.31 

Theta 

1.3517 

1.376 

1.402 

Standard   Error 

.0898 

.0920 

.094 

2*log   likelihood 

8770.856 

8783.55 

8797.05 

AIC 

1590.052 

1601.532 

1613.75 

Pred   Error   Norml 

2.6659 

2.6668 

2.6725 

Pred   Error   Norm2 

1.8831 

1.886 

1.8923 

Parameter 

Section  Length  slen 

Intersections  its 

Traffic  Volume  adt 


Parameter  Listing:  The  Model  SPDl  (  =AD2) 

Sum   of  Sq     Mean   Sq     F  Value  Pr(F) 

778.889  778.8891  691.0159  0.00000000 

61.090   61.0904   54.1983  0.00000000 

208.089  208.0886  184.6124  0.00000000 

32.6042  0.00000001 

9.2459  0.00240237 

8.8192  0.00302953 

3.1908  0.07426068 

65.7530  0.00000000 


Speed  Limit 

spd 

36 

750 

36.7503 

Parking 

pk 

10 

422 

10.4216 

Paved  Shoulder 

ops 

9 

941 

9.9407 

Raised   Curb 

oc 

3 

597 

3.5966 

Product 

slen 

its 

74 

114 

74.1145 

«:r=;»=':!lli;i:,ir-iii    ■■        . 
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A  stepwise  regression  on  SPD3  rejected  the  quadratic 
term  of  speed  limit  from  the  model.   Therefore  SPD3  is  not 
considered  in  the  selection  process.   Among  SPDl  and  SPD2, 
SPDl  has  better  model  parameters  and  better  prediction 
quality.   All  parameters  that  correspond  to  are  superior  to 
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SPD2,  which  indicates  that  defining  speed  limit  as  a 
continuous  variable  is  the  best  among  both  the  options. 

Though  SPD2,  the  categorical  model  is  not  selected  as 
the  best  model,  it  has  given  some  very  powerful  results  which 
help  to  accept  the  results  of  SPDl  with  more  confidence  - 
"higher  speed  limits  are  associated  with  safer  highway 
sections . " 

Representing  Cross  Sectional  Factors 

Lane  width  is  the  most  important  parameter  among  all 
cross  sectional  variables.   The  next  significant  parameter  is 
paved  shoulder  width  and  the  least  of  all  is  the  unpaved 
shoulder  width.  In  terms  of  safety,  this  belief  need  not  be 
true.  Though  the  lane  width  provides  the  primary  function, 
which  is  moving  the  traffic,  the  shoulder  plays  a  major  role 
in  situations  of  emergency. 

On  rural  highways,  some  clear  area  is  provided  beyond 
the  unpaved  shoulder.  This  area  is  called  the  safe  recovery 
area.   Safe  recovery  area  gives  vehicles  under  danger  a  very 
high  chance  of  surviving  calamities.   For  urban  highways, 
this  provision  is  usually  absent  due  to  unavailability  of 
adequate  right-of-way. 
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The   Lane   Width   Problem: 

All  studies  done  in  the  past  have  concluded  that  lane 
width  is  an  important  parameter,  which  has  significant 
influence  on  crash  frequency.   Moreover,  it  is  very 
reasonable  and  logical  to  expect  lane  width  to  have  a 
tremendous   effect  on  safety.   In  all  models,  the  p-value  of 
lane  width  was  significantly  high  and  stepwise  regression 
procedures  consistently  rejected  lane  width  and  prevented  it 
from  becoming  one  of  the  predictors. 

In  the  following  sections,  some  methods  are  used  to 
identify  the  behavior  of  lane  width.   If  lane  width  does  not 
affect  crash  frequencies,  it  will  be  a  strange  result.   If 
lane  width  does  affect  crash  frequency,  there  must  be  an 
underlying  behavior  pattern,  which  could  be  preventing  it 
from  staying  in  the  prediction  models. 

Categorical    Treatment   of  Lane   width: 

The  value  of  lane  width  ranges  from  9  feet  15  feet. 
The  sorted  plot  of  lane  width  [Figure  4.2]  shows  that  it  is 
discrete  in  nature  and  assumes  only  integer  values.   New 
indicator  variables  were  defined  to  treat  lane  width  as  a 
categorical  variable.   The  values  assumed  by  these  new 
variables  corresponding  to  various  levels  of  lane  width  are 
shown  in  the  following  listing. 


# 

Lane  width 

1 

9  feet 

2 

10  feet 

3 

11  feet 

4 

12  feet 

5 

13  feet 

6 

14  feet 

7 

15  feet 

IwO      Iwl      lw2      lw3      lw4      lw5 


1 

Q 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

Q 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

1 

The  seven  discrete  values  of  lane  width  (9,  10,  11,  12, 
13,  14,  and  15  feet)  were  put  into  six  categories.   The  first 
two  discrete  values  are  identified  by  the  same  category  since 
there  are  few  highway  sections  with  9  feet  wide  lanes.  The 
model  obtained  from  categorical  treatment  of  lane  width  is 
LW2. 

In  model  LW2,  the  variable  Iwc  which  represents  the 
lane  width  parameter  defined  as  a  categorical  variable  has 
become  significant  with  low  p-value.   The  model  parameters 
are  shown  in  Table  7.5.   The  model  parameters  are  not 
superior  when  compared  to  other  forms  representing  lane 
width.   Therefore  this  model  is  also  rejected. 

The  coefficients  of  model  LW2  revealed  a  typical 
behavior  pattern.   As  lane  width  increases  the  crash 
frequency  decreases  initially  but  as  lane  width  is  further 
increased,  the  crash  frequency  increased  instead  of 
decreasing.  The  shape  of  this  trend  can  be  approximated  by  a 
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horizontal  line  (slope  is  0)  rather  than  an  inclined  line. 
This  behavior  prevented  it  from  being  a  significant  parameter 
in  the  prediction  model  as  a  continuous  variable.   Since  the 
relationship  is  nonlinear,  a  square  term  is  included  to 
capture  and  represent  the  behavior  of  lane  width 
successfully.   The  resulting  model  LW3  is  also  discussed  in 
Table  7.5.   The  effect  of  lane  width  on  safety  could  be 
greatly  influenced  by  the  availability  of  paved  shoulder 
width.   A  product  term  representing  interaction  between  lane 
width  and  paved  shoulder  width  was  rejected  based  on  AIC. 

Introducing  Pavement   Width: 

In  two  lane  highways,  the  boundary  of  lane  width  and 
paved  shoulder  width  is  just  a  solid  white  line.   Unlike 
multi-lane  highways,  this  line  has  the  least  importance  in  a 
two-lane  highway  since  all  vehicles  have  direct  access  to  the 
paved  shoulder. 

In  highway  sections  with  more  than  one  lane  in  each 
direction,  only  vehicles  in  the  outer  most  lane  have  direct 
access  to  the  paved  shoulder.  Pavement  width  is  defined  as 
the  sum  of  lane  width  and  paved  shoulder  width.   For  two-lane 
highways,  pavement  width  may  be  considered  as  the  effective 
lane  width  since  vehicles  can  use  the  paved  shoulder  without 
any  restrictions . 
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Pavement  width  could  be  modeled  as  a  single  parameter 
instead  of  using  lane  width,  paved  shoulder  width  and  the 
product  term  to  represent  interactions.   The  model  thus 
obtained,  LW4  is  compared  with  other  models  to  find  the  best 
prediction  model  among  the  group.   The  following  listing 
shows  a  brief  description  of  the  models  considered  and  Table 
7.5  shows  model  parameters  of  all  models  discussed. 


# 

Model 

Lane   Width 

Rejected  by  Stepwise 

1 

LWl 

Iw 

Iw 

2 

LW2 

Iwc 

oups 

3 

LW3 

Iw,  Iw^ 

oups,  oc 

4 

LW4 

pw  =  Iw+ops 

oc 

TABLE  7.5   Models  for  representing  Lane  width 

Model  Parameter 

LWl 

LW2 

LW3 

LW4 

Null  Deviance 

2918.41 

2954.86 

2926.45 

2917.64 

Residual  Deviance 

1570.65 

1574.12 

1571.28 

1569.89 

Theta 

1.3517 

1.3819 

1.3583 

1.3511 

Standard  Error 

.0898 

.0928 

.0905 

.0897 

2*log  likelihood 

8770.86 

8783.48 

8773.79 

8771,27 

AIC 

1590.052 

1602.28 

1597.22 

1589.29 

Pred  Error  Norml 

2.666 

2.668 

2.665 

2.663 

Fred  Error  Norm2 

1.8832 

1.8832 

1.882 

1.880 

Though  LW2,  the  categorical  model  was  successful  in 
revealing  the  behavior  of  lane  width,  the  prediction  error 
did  not  improve  while  it  became  slightly  worse  from  2.666  to 
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2.668.   Model  LW3  representing  the  square  form  of  lane  width 
is  a  slight  improvement.   Model  LW4  which  represents  pavement 
width  gave  the  best  results.  Therefore  LW4  is  selected  as  the 
best  of  all  these  models. 


Parameter  Listing:  The  Model  LW4 

Parameter  Code  Sum   of  Sq     Mean    Sq     F  Value               Pr (F) 

Section  Length  slen  779.363  779.3627  690.0949  0.00000000 

Intersections  its  61.340   61.3403   54.3145  0.00000000 

Traffic  Volume  adt  207.298  207.2979  183.5541  0.00000000 

Speed  Limit  spd  36.681   36.6808   32.4794  0.00000001 

Parking  pk  10.020   10.0201    8.8724  0.00294298 

Pavement  Width  pw  9.443    9.4431    8.3615  0.00388939 

Raised  Curb  oc  3.731    3.7308    3.3035  0.06933926 

Product  slen:its  75.747   75.7475   67.0714  0.00000000 


Analysis   of  Shoulder : 

The  parameters  ops,  oups  and  oc  which  represents 
outside  paved  shoulder  width,  outside  unpaved  shoulder  width 
and  presence  of  raised  curb  respectively  have  managed  to 
survive  the  AIC  criteria  and  found  a  place  in  the  model.   The 
coefficients  of  these  parameters  suggest  that,  increasing 
paved  or  unpaved  shoulder  reduces  crash  frequency  and  the 
presence  of  raised  curb  increases  crash  frequency.  Though 
these  parameters  were  qualified  by  the  AIC  criteria,  the 
standard  error  and  p-values  are  very  high. 


Paved  and  unpaved  shoulders  help  to  increase  safety  of 
any  highway  section.   All  the  models  that  were  developed  in 
the  past  support  this  argument.   But  to  what  extent  is  a 
shoulder  capable  of  reducing  crashes  efficiently?   The  answer 
to  this  question  is  revealed  through  the  analysis  shown  in 
the  following  sections. 

Categorical    treatment   of  shoulder: 

A  sorted  plot  of  shoulder  widths  is  shown  in  Figure 
4.2.   The  pattern  seen  in  the  plots  suggests  that  values  of 
shoulder  widths  are  discrete.   The  value  ranges  from  0-12. 
If  an  indicator  variable  is  assigned  to  represent  each  value 
of  shoulder  width,  the  degrees  of  freedom  will  increase  by 
24.  To  reduce  the  degrees  of  freedom,  values  in  specific 
ranges  are  included  in  the  same  category.   The  following 
listing  shows  how  shoulder  width  parameters  can  be  redefined 
to  reduce  the  total  degrees  of  freedom  to  6. 

#  opsc/  ops/  opscO/  opscl/  opsc2/  opsc3/ 

oupsc  oups  oupscO  oupscl  oupsc2  oupsc3 

10        0       1        0        0  0 

2  3  2-4      0        1       0  0 

3  6  5-7      0        0        1  0 

4  9  8-12     0        0        0  1 
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It  was  observed  that  both  paved  and  unpaved  shoulders 
have  strong  influence  on  the  model.  Estimates  of   standard 
error  and  p-value  for  these  parameters  were  found  to  be  low. 
The  models  discussed  above  are  listed  below  and  the  model 
parameters  are  shown  in  Table  7.6. 

The  paved  shoulder  showed  a  behavior  similar  to  that  of 
lane  width.   The  model  statistics  are  inferior  to  that  of 
the  model  in  which  pavement  width  is  used.   Though  this 
model  is  not  preferred,  the  categorical  treatment  has  some 
important  results  to  offer  which  will  be  discussed  in  the 
final  chapter. 


# 

Model 

Cro 

ss  Section 

1 

SHI 

pw 

2 

SH2 

Iw, 

opsc,  oups 

3 

SH3 

Iw, 

opsc,  oupsc 

4 

SH4 

pw. 

oupsc 

Rejected  by  Stepwise 

oups 

Iw,  oups 

Iw,  oc 


TABLE  7.6   Models  representing  Cross  Sectional  Factors 

Model  Parameter 

SHI 

SH2 

SH3 

SH4 

Null  Deviance 

2917.64 

2929.60 

2952.74 

2949.70 

Residual  Deviance 

1569.89 

1570.36 

1573.40 

1571.97 

Theta 

1.3511 

1.3608 

1.3798 

1.377 

Standard  Error 

.0897 

.0906 

.0925 

.0922 

2*log  likelihood 

8771.27 

8776.10 

8783.27 

8783.35 

AIC 

1589.29 

1594.12 

1597.19 

1595.745 

Pred  Error  Norml 

2.6633 

2.6661 

2.6643 

2.6617 

Pred  Error  Norm2 

1.8832 

1.8836 

1.8812 

1.8784 
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Parameter"  Listing:  The  Model  SH4' 

Parameter  Code  Sum   of  Sq     Mean    Sq  F  Value               Pr  (F) 

Section  Length  slen  782.203  782.2030  694.9421  0.00000000 

Intersections  its  63.645   63.6448  56.5448  0.00000000 

Traffic  Volume  adt  207.663  207.6626  184.4962  0.00000000 

Speed  Limit  spd  37.945   37.9447  33.7116  0.00000001 

Parking  pk  9.600    9.6001  8.5291  0.00354894 

Pavement  Width  pw  10.081   10.0806  8.9560  0.00281215 

Unpaved  Shldr  oupsc  12.403    4.1343    3.6731  0.01182596 

Product  slen:its  76.999   76.9993  68.4094  0.00000000 

The  unpaved  shoulder,  when  given  categorical  treatment 
showed  a  pattern  different  from  that  of  both  lane  width  and 
paved  shoulder  width.  Besides,  this  model  has  a  significant 
improvement  over  the  former  model  though  the  degree  of 
freedom  increased  by  3.   Therefore  SH4  is  considered  as  the 
best  model  in  which  pavement  width  is  used  to  represent  lane 
width  and  paved  shoulder  width,  and  unpaved  shoulder  width  is 
expressed  as  categorical  variable. 

Identifying  Significant  Interactions 
The  previous  sections  evaluated  the  transformation 
functions  and  identified  the  best  way  of  representing  each 
parameter  in  the  model.   The  variables  that  are  assumed  to  be 
independent  need  not  be  truly  independent.   The  presence  of 
powerful  interactions  among  variables  could  be  identified  and 
measured  using  their  product  terms. 


\ 
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A  large  model  is  developed  from  the  current  model  by 
allowing  all  possible  second  level  interactions.  Interactions 
at  level  three  and  above  are  neglected  due  to  the  increased 
level  of  complexity  and  unexplainability  of  resulting   terms. 
The  resulting  model,  INT2  is  not  better  than  any  of  the 
simpler  models  discussed  in  the  earlier  sections  but  this 
model  could  lead  towards  identifying  some  powerful 
interactions.  Since  this  model  has  a  large  number  of 
parameters,  it  is  able  to  give  a  good  fit  with  the  present 
data.   But  this  model  has  very  high  prediction  error. 
Besides,  most  of  the  second  level  interactive  parameters  are 
unexplainable . 


'=--'™°"':==""t' «:^::H=r-.--:  Parameter  Listing:  The  Model  INT3  '■^•'•"- •"•'••'-'•'     iV" 

Parameter  Code  Sum   of  Sq     Mean    Sq     F  Value               Pr(F) 

Section  Length  slen  799.720  799.7196  712.9671  0.00000000 

Intersections  its  62.956   62.9564   56.1270  0.00000000 

Traffic  Volume  adt  214.338  214.3380  191.0869  0.00000000 

Speed  Limit  spd  43.483   43.4826   38.7657  0.00000000 

Parking  pk  7.774    7.7741    6.9308  0.00856236 

Pavement  Width  pw  11.006   11.0057    9.8118  0.00176869 

Unpaved  Shldr  oupsc  13.405    4.4683    3.9836  0.00771230 

Product  slen:its  81.607   81,6067   72.7541  0.00000000 

Product  slen:spd  15.395   15,3945   13.7245  0.00021963 

Product:  its:spd  3.214    3.2137    2.8651  0.09073489 

Product  adt:oupsc  21.650    7.2166    6.4337  0.00025089 

Product;:  spd:pk  18.935   18.9347   16.8806  0.00004203a 


Even  though  this  model  cannot  be  accepted  above  any 
other  models,  it  gives  some  powerful  insights  to  a  few 
important  interactive  terms.   Such  terms  are  identified  after 
screening  this  full  model  through  the  stepwise  filter.   The 
resulting  smaller  model,  INT3  has  reduced  the  prediction 
error  considerably  compared  to  models  INTl  and  INT2 . 


# 

Model 

1 

INTl 

2 

INT2 

3 

INT3 

4 

INT4 

Characteristics 

best  model  from  previous  section 

all  second  degree  interactions 

model  selected  by  stepwise  regression 

spd:its  removed  manually 


TABLE  7.7   Models  representing  Second  Degree  Interactions 

Model  Parameter 

INTl 

INT2 

INT3 

INT4 

Null  Deviance 

2949.70 

3177.45 

3077.12 

3070.09 

Residual  Deviance 

1571.97 

1579.94 

1573.31 

1573.64 

Theta 

1.377 

1.5755 

1.4884 

1.4816 

Standard  Error 

.0922 

.1108 

.1025 

.1018 

2*log  likelihood 

8783.35 

8870.79 

8836.49 

8833.24 

AIC 

1595.745 

1673.16 

1610.23 

1608.36 

Pred  Error  Norml 

2.6617 

2.6748 

2.6545 

2.6531 

Fred  Error  Norm2 

1.8784 

1.8964 

1.8746 

1.8735 

Model  INT3  is  further  checked  to  see  if  there  are 
interactions,  which  are  very  weak  and  unexplainable .   Such 
variables  are  removed  and  checked  to  see  if  such  removal 
could  improve  the  prediction  error.   The  interaction  between 


speed  limit  and  number  of  intersections  is  very  week  and  has 
very  high  p-value.  Removing  this  term  (INT4)  has  further 
improved  the  prediction  quality.   The  important  models  in 
this  series  of  analyses  are  listed  below  and  the  model 
parameters  are  shown  in  Table  7.7. 


Parameter  Listing:  The  Model  INT4 


Parameter 
Section  Length 
Intersections 
Traffic  Volume 
Speed  Limit 
Parking 

Pavement  Width 
Unpaved  Shldr 
Product 
Product 
Product 
Product •  : 


Code 

slen 
its 
adt 
spd 
pk 
pw 

oupsc 
slen: its 
slen: spd 
adt : oupsc 
spd:pk 


Sum   of  Sq     Mean    Sq     F  Value  Pr(F) 

799.117  799.1168  712.8898  0.000000000 

65.146   65.1457   58.1163  0.000000000 

214.570  214.5702  191.4175  0.000000000 

43.456   43.4564   38.7674  0.000000001 

7.954    7.9544    7.0961  0.007810734 

10.908   10.9084    9.7314  0.001847196 

13.497    4.4990    4.0135  0.007400247 

81.778   81.7782   72.9541  0.000000000 

15.911   15.9113   14.1945  0.000171459 

20.814    6.9379    6.1893  0.000354112 

18.252   18.2516   16.2822  0.000057421^^ 


The  Final  Model 
The  previous  sections  of  this  chapter  displayed  a 
series  of  regression  models  in  stages.   Each  stage  of 
improvement  was  supported  by  improvement  in  model  parameters 
and  justified  by  a  corresponding  reduction  in  the  mean 
prediction  error.   The  final  model  selected  from  this  series 
of  analysis  is  INT4 . 
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Table  7.8    Observed  vs.  Predicted  Values 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

/ 

1 

1 

0 

56 

2 

L      2 

0 

111 

2 

3 

2 

3 

3 

0 

57 

1 

1 

0 

112 

0 

1 

J 

4 

4 

0 

58 

2 

2 

0 

113 

I 

2 

4 

1 

1 

0 

59 

6 

6 

r-Q 

114 

0 

1 

5 

1 

1 

0 

60 

! 

1 

"~    0 

115 

1 

2 

6 

2 

2 

0 

61 

6 

6 

0 

116 

3 

4 

7 

3 

3 

0 

62 

5 

5 

0 

117 

0 

1 

8 

7 

7 

0 

63 

1 

1 

0 

118 

1 

2 

9 

0 

0 

0 

64 

2 

2 

0 

119 

3 

2 

10 

6 

6 

0 

65 

1 

1 

0 

120 

0 

1 

11 

1 

1 

0 

66 

2 

2 

0 

121 

1 

2 

12 

2 

2 

0 

67 

1 

1 

0 

122 

0 

1 

13 

0 

0 

0 

68 

14 

14 

0 

123 

1 

2 

14 

0 

0 

0 

69 

1 

1 

0 

124 

2 

1 

15 

1 

1 

0 

70 

3 

4 

125 

0 

1 

16 

0 

0 

0 

71 

0 

1 

126 

0 

1 

17 

0 

0 

0 

72 

2 

1 

127 

0 

1 

18 

3 

3 

0 

73 

0 

1 

f28~ 

2 

1 

19 

0 

0 

0 

74 

0 

1 

129 

0 

1 

20 

1 

1 

0 

75 

0 

1 

130 

7 

6 

21 

1 

1 

0 

76 

5 

6 

131 

0 

1 

22 

0 

0 

0 

7/ 

9 

8 

132 

0 

1 

23 

1 

1 

0 

78 

0 

1 

133 

0 

1 

24 

1 

1 

0 

79 

8 

7 

134 

2 

3 

25 

1 

1 

0 

80 

0 

1 

135 

0 

1 

26 

2 

2 

0 

81 

0 

1 

136 

0 

I 

27 

1 

1 

0 

82 

5 

4 

137 

2 

1 

28 

1 

1 

0 

83 

0 

1 

138 

1 

2 

29 

1 

1 

0 

84 

0 

1 

139 

2 

1 

30 

1 

1 

0 

85 

0 

1 

140 

0 

1 

31 

0 

0 

0 

86 

3 

2 

141 

0 

1 

32 

2 

2 

0 

87 

0 

1 

142 

1 

2 

33 

9 

9 

0 

88 

0 

1 

143 

3 

2 

34 

1 

1 

0 

89 

0 

1 

144 

1 

2 

35 

1 

1 

0 

90 

2 

1 

145 

0 

1 

36 

1 

1 

0 

91 

0 

1 

146 

1 

2 

37 

1 

1 

0 

92 

0 

1 

147 

0 

38 

1 

1 

0 

93 

0 

1 

148 

0 

39 

1 

1 

0 

94 

0 

1 

149 

2 

40 

3 

3 

0 

95 

0 

1 

150 

0 

41 

6 

6 

0 

96 

0 

1 

151 

0 

42 

2 

2 

0 

97 

0 

1 

152 

2 

43 

1 

1 

0 

98 

0 

1 

153 

0 

44 

0 

0 

0 

99 

3 

2 

154 

0 

45 

1 

1 

0 

100 

0 

1 

155 

0 

46 

2 

2 

0 

101 

0 

1 

156 

3 

4 

47 

1 

1 

0 

102 

0 

1 

157 

0 

48 

1 

1 

0 

103 

0 

1 

158 

0 

49 

1 

1 

0 

104 

2 

1 

159 

2 

50 

0 

0 

0 

105 

0 

1 

160 

0 

51 

1 

1 

0 

106 

1 

2 

161 

1 

2 

52 

0 

0 

0 

107 

0 

1 

162 

0 

53 

0 

0 

0 

108 

0 

1 

163 

0 

54 

1 

1              0 

109 

3 

2 

164 

3 

2 

55 

2 

2              0 

110 

3 

4      r 

165 

13 

12 

89 


Table  7.8    Observed  vs.  Predicted  (Contd...) 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

166 

2 

1 

221 

0 

276 

0 

2 

2 

167 

5 

4 

222 

0 

277 

0 

2 

2 

168 

2 

223 

0 

278 

9 

11 

2 

169 

0 

224 

2 

279 

4 

2 

2 

170 

0 

1 

225 

0 

280 

2 

4 

2 

171 

2 

226 

0 

281 

3 

1 

2 

172 

6 

227 

0 

282 

6 

8 

2 

17  i 

3 

2 

228 

2 

283 

3 

1 

2 

174 

4 

3 

229 

0 

284 

3 

1 

2 

175 

4 

5 

230 

1 

285 

2 

4 

2 

176 

3 

2 

231 

0 

286 

0 

2 

2 

177 

0 

1 

232 

0 

287 

0 

2 

2 

178 

3 

4 

233 

0 

288 

0 

2 

2 

179 

0 

234 

3 

289 

1 

3 

2 

180 

2 

235 

0 

290 

7 

5 

2 

181 

3 

2 

236 

0 

291 

4 

2 

2 

182 

2 

237 

0 

292 

1 

3 

2 

0 

238 

1 

2 

293 

3 

1 

2 

184 

0 

239 

2 

294 

0 

2 

2 

185 

0 

1 

240 

0 

295 

3 

1 

2 

186 

0 

241 

3 

4 

296 

3 

5 

2 

187 

1 

0 

242 

8 

7 

297 

8 

6 

2 

188 

0 

243 

1 

2 

1 

298 

5 

3 

^  2 

189 
190' 

0 

244 

4 

3 

299 

2 

4 

2 

0 

245 

2 

3 

300 

2 

4 

2 

191 

2 

246 

1 

2 

301 

4 

2 

2 

192 

2 

247 

3 

4 

302 

2 

4 

2 

193 

0 

248 

0 

1 

303 

0 

2 

2 

194 

0 

249 

5 

4 

304 

0 

2 

2 

193 

0 

250 

0 

1 

305 

0 

2 

'     2 

196 

0 

251 

0 

1 

306 

0 

2 

2 

197 

2 

252 

3 

4 

307 

1 

3 

2 

198 

0 

253 

0 

1 

308 

0 

2 

2 

199 

3 

4 

254 

0 

1 

309 

0 

2 

2 

200 

« 

255 

2 

3 

310 

0 

2 

2 

201 

0 

256 

2 

1 

311 

4 

2 

2 

202 

0 

257 

0 

1 

312 

3 

1 

2 

203 

0 

258 

0 

1 

313 

3 

1 

2 

204 

'0     " 

259 

3 

2 

314 

0 

2 

2 

205 

0 

260 

1 

2 

315 

0 

2 

2 

206 

0 

261 

2 

I 

316 

0 

2 

2 

207 

0 

262 

0 

1 

317 

4 

6 

2 

208 

0 

263 

0 

1 

318 

0 

2 

2 

209 

0 

264 

0 

1 

319 

1 

3 

2 

210 

2 

265 

2  '] 

3 

320 

1 

3 

2 

211 

0 

266 

4 

3 

321 

5 

3 

2 

212 

0 

267 

2 

3 

322 

0 

2 

2 

213 

0 

268 

7 

6 

323 

1 

3 

2 

214 

0 

269 

3 

2 

324 

0 

2 

2 

215 

0 

270 

1 

2 

325 

3 

1 

2 

216 

0 

271 

1 

2 

326 

6 

4 

2 

217 

0 

272 

0 

1 

327 

I 

3 

2     " 

218 

0 

273 

0 

2 

2 

328 

0 

2 

2 

219 

0 

274 

5 

3 

2 

329 

2 

4 

2 

220 

0 

275 

8 

6 

2 

330 

0 

2 

2 
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Table  7.8    Observed  vs.  Predicted  (Contd...) 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

# 

Crash  Frequency 

Absolute 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

Actual 

Predicted 

Error 

331 

4 

6 

2 

:iyy 

2 

6 

4 

423 

8 

1 

1 

332 

13 

11 

2 

jy« 

7 

3 

4 

424 

8 

1 

7 

333 

9 

7 

2 

J  79 

5 

1 

4 

425 

3 

11 

8 

334 

10 

8 

2 

380 

0 

4 

4 

426 

15 

7 

8 

335 

0 

2 

2 

381 

0 

4 

4 

427 

5 

13 

8 

336 

2 

4 

2 

382 

1 

5 

4 

428 

8 

16 

8 

337 

0 

2 

2 

383 

7 

3 

4 

429 

0 

8 

8 

338 

5 

2 

3 

384 

1 

5 

4 

430 

2 

10 

8 

339 

4 

1 

3 

385 

12 

16 

4 

431 

11 

3 

8 

340 

1 

4 

3 

386 

6 

2 

4 

432 

0 

8 

8 

341 

6 

3 

3 

387 

6 

10 

4 

433 

23 

31 

8 

342 

0 

3 

3 

388 

0 

4 

4 

434 

13 

5 

8 

343 

0 

3 

3 

389 

7 

2 

5 

435 

11 

1      20 

9 

344 

0 

3 

3 

390 

10 

15 

5 

436 

7 

16 

9 

345 

4 

1 

3 

391 

0 

5 

5 

437 

5 

14 

9 

346 

4 

1 

3 

392 

2     ^ 

7 

5 

438 

14 

5 

9 

347 

6 

3 

3 

393 

4 

9 

5 

439 

8 

18 

10 

348 

10 

7 

3 

394 

2 

7 

5 

440 

14 

4 

10 

349 

0 

3 

3 

395 

7 

2 

5 

441 

5 

15 

10 

350 

3 

6 

3 

396 

0 

5 

5 

442 

14 

3 

11 

351 

4 

7 

3 

397 

7 

2 

5 

443 

12 

1 

11 

352 

4 

1 

3 

398 

21 

26 

5 

444 

7 

18 

11 

353 

4 

1 

3 

399 

2 

7 

5 

445 

18 

6 

12 

354 

1 

4 

3 

400 

7 

2 

5 

446 

19 

7 

12 

355 

0 

3 

3 

401 

0 

5 

5 

447 

15 

28 

13 

356 

0 

3 

3 

402 

12 

17 

5 

448 

21 

8 

13 

357 

2 

5 

3 

403 

10 

5 

5 

449 

16 

3 

13 

358 

0 

3 

3 

404 

8 

3 

5 

450 

6 

19    J 

13 

359 

0 

3 

3 

405 

9 

4 

5 

451 

18 

5 

13 

360 

6 

3 

3 

406 

6 

1 

5 

452 

15 

2 

13 

361 

12 

9 

3 

407 

2 

7 

5 

453 

23 

10 

13 

362 

2 

5 

3 

408 

7 

1 

6 

454 

29 

43 

14 

363 

7 

4 

3 

409 

11 

5 

6 

455 

17 

31 

14 

364 

0 

3 

3 

410 

18 

24 

6 

456 

23 

9 

14 

365 

7 

3 

4 

411 

3 

9 

6 

457 

29 

14 

15 

366 

0 

4 

4 

412 

1 

7 

6 

458 

21 

6 

15 

367 

3 

7 

4 

413 

9 

3 

6 

459 

1 

17 

16 

368 

0 

4 

4 

414 

5 

11 

6 

460 

7 

25 

18 

369 

5 

1 

4 

415 

8 

14 

6 

461 

13 

33 

20 

370 

4 

0 

4 

416 

4 

10 

6 

462 

30 

8 

22 

371 

5 

1 

4 

417 

1 

7 

6 

463 

3 

27 

24 

372 

_.   ..^^    _ 

1     "t 

4 

418 

5 

11 

6 

464 

4 

30 

26 

373 

4 

8 

4 

419 

9 

3 

6 

465 

0 

29 

29 

374 

7 

3 

4 

420 

4 

11 

7 

466 

9 

45 

36 

375 

12 

8 

4 

421 

11 

T8 

7 

467 

10 

50 

40 

376 

7 

J 

4 

422 

4 

11 

7 

468 

6 

46 

40 

Average  prediction  error  on  90  %  oj  the  data  =  .45  crashes/section 

Aveage  prediction  error  on  95  %  of  the  data  =  .54  crashes/section 

%  of  observations  for  which  average  prediction  error  <=1  is  100% 

Mean  Absolute  Error  on  90  %  of  the  data  =  1.65  crashes/section 

Mean  Absolute  Error  on  95  %  of  the  data  =  2.03  crashes/section 

%  of  observations  for  which  mean  absolute  prediction  error  <=1  is  72.44  % 
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Model  testing  was  performed  at  each  stage  of  model 
development  using  the  norms  developed  from  test  data.   Such 
tests  were  used  to  compare  relative  performance  of  each  model 
at  various  stages. 

The  final  model  is  used  to  predict  the  crash  frequency 
of  the  highway  sections  in  the  test  data.   The  actual  number 
of  crashes  observed,  the  crash  frequency  predicted  by  the 
model,  and  the  differences  between  actual  and  predicted  value 
representing  prediction  error  are  displayed  in  Table  7.9. 

The  prediction  error  was  found  to  be  high  for  high 
accident  locations.   Out  of  468  highway  sections  in  the  test 
data,  crash  frequencies  at  69  highway  sections  were  predicted 
accurately  with  zero  error,  crash  frequencies  at  272  sections 
were  predicted  within  a  range  of  1  crash/section/year  and  337 
highway  sections  were  predicted  within  a  range  of  2  crashes 
/section/year. 

From  the  remaining  131  highway  sections,  103  highway 
sections  were  high  crash  sections  and  they  were  all  predicted 
as  high  crash  locations.   Only  27  sections,  which  consist  of 
about  5%  of  the  data,  had  unreliable  prediction. 


CHAPTER  8 
RESULTS  AND  DISCUSSION 


The  previous  chapter  has  dealt  with  some  of  the 
important  issues  concerning  regression  analysis  and  modeling 
of  crash  frequencies.   The  models  presented  in  the  previous 
chapter  were  developed  using  75%  of  the  data  and  the  other 
25%  of  the  data  was  used  to  test  the  models  and  to  perform 
comparative  performance  in  prediction.   Using  the  procedure 
described  in  the  previous  chapter,  crash  prediction  models 
were  developed  for  various  types  of  crashes.   The  entire 
data  set  was  used  to  develop  these  models. 

The  final  crash  prediction  models  are  shown  in  the 
following  sections.   The  first  model  is  developed  from  total 
crash  frequency  and  the  following  three  models  are  developed 
from  subsets  of  total  crashes  classified  by  crash  severity. 
The  Florida  Department  of  Transportation  has  specifications 
for  classifying  crashes  based  on  severity.   According  to 
their  classification,  a  highway  crash  may  be  identified  as 
one  of  the  three  classes,  property  damage  only,  injury 
crashes  and  fatal  crashes.  The  sums  of  these  three  types  of 
crashes  give  the  total  number  of  crashes. 
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All  Crashes 
The  total  number  of  crashes  that  occur  in  a  two-lane 
urban  highway  section  is  modeled  as  a  function  of  the 
section  length,  traffic  volume,  number  of  intersections, 
speed  limit,  presence  of  on-street  parking,  pavement  width, 
and  unpaved  shoulder  width.   The  model  parameters  listed  in 
the  following  sections  are  taken  from  S+  analysis  output. 


I  .   The'  'Model  ■■  ■  :.z...::.:A=:::..S:mt:mgs^^^        ss.':::=:::.: 

acc  ~  slen  +  its  +  adt  +  spd  +  pk  +  pw  +  oupsc  +  slen*its  + 

slen*spd  +  adt*oupsc  +  spd*pk, 
data  =  u,  init.theta  =  1.39889 
family  =  negative  binomial,  link  =  log 


II.  Model  Coefficients: 

Variable        Value 

(Intercept)  -3 . 582133e-001 

slen   5.7445426-003 

its   1.651064e-001 

adt   6.723726e-005 

spd  -4.227103e-003 

pk   2.529681e+000 

■   pw  -4.311935e-002 

oupscl   9.256181e-002 

oupsc2  -2.159337e-001 

oupsc3  -5.313014e-003 

slen:its  -1 . 863185e-004 

slen:spd  -5 . 470994e-005 

adtoupscl   8 .575736e-006 

adtoupsc2   1.747212e-005 

adtoupsc3  -3.1845326-007 

spd:pk  -0.0642617  . 


Std.  Error 
2.839186e-001 
5.6835486-004 
1.7361736-002 
5.4403086-006 
5.2535676-003 
5.9747096-001 
1.3026676-002 
1.2878076-001 
5.7896646-002 
3.3421226-002 
1.8544556-005 
1.1661446-005 
9.6550906-006 
4.0291426-006 
2.3003036-006 
0.0181015 


t  value 
-1.2616761 
10.1073166 

9.5097901 
12.3590900 
-0.8046158 

4.2339810 
-3.3100824 

0.7187552 

-3.7296422 

-0.1589713 

-10.0470742 

-4.6915269 

0.8882088 

4.3364370 
-0.1384397 
-3.55006 
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III.  Sum  of  Squares  Table: 

Variable    Df  Sum  of  Sq  Mean  Sq   F  Value  Pr(F) 

1    982.958  982.9577  861.8348  0.00000000 

75.908   75.9076   66.5540  0.00000000 

289.849  289.8490  254.1330  0.00000000 

48.244   48.2438   42.2990  0.00000000 


slen 

its 

adt 

spd 

pk 

pw 

oupsc 

slen: its 

slen: spd 

adt: oupsc 

spd:pk 


1 
1 
1 
1 
1 
3 
1 
1 
3 
1 


15.573  15.5735 

15.203  15.2030 

11.722  3.9075 

108.481  108.4806 

19.717  19.7173 

24.353  8.1176 

12.603  12.6029 


13.6545  0.00022588 

13.3297  0.00026824 

3.4260  0.01653190 

95.1133  0.00000000 

17.2877  0.00003354 

7.1173  0.00009388 

11.0500  0.00090359 


IV.  Analysis 

;:  Variable  Df 

NULL 

slen  1 

its  1 

adt  1 

spd  1 

pk  1 

pw  1 

oupsc  3 

slen: its  1 

slen: spd  1 

adt: oupsc  3 

spd:pk  1 


of  Deviance  Table 

Deviance  Resid.  Df  Resid.  Dev 

1933  3879.361 

1153.160  1932  2726.200 

79.215  1931  2646.985 

318.403  ;  1930  2328.582 

50.483  1929  2278.099 

18.876  1928  2259.223 

14.168  1927  2245.055 

11.159  1924  2233.896 

108.505  1923  2125.391 

15.735  ;  1922  2109.656 

23.145  1919  2086.511 

15.780  1918  2070.731 


Pr (Chi) 

0.00000000 
0.00000000 
0.00000000 
0.00000000 
0.00001395 
0.00016717 
0.01089413 
0.00000000 
0,00007285 
0.00003767 
0.00007117 


V.  Model  Statistics: 

Null  Deviance:  3879.361  on  1933  degrees  of  freedom 

Residual  Deviance:  2070.731  on  1918  degrees  of  freedom 

Theta:   1.39889       Std.  Err.:   0.08232 

2  X  log-likelihood:  10922.42761        AIC=  2105.279 
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The  statistical  fitting  is  performed  by  generalized 
linear  model  analysis  procedures.   The  negative  binomial 
distribution  was  assumed  and  the  link  function  used  was 
natural  log.   Any  parameter  not  qualified  by  the  AIC 
criteria  was  removed. 

The  terms  that  involve  outside  unpaved  shoulder  have  a 
degree  of  freedom  3,  which  correspond  to  four  different 
levels  it  can  assume  as  a  categorical  variable.   All  p- 
values  are  less  than  .05,  which  shows  that  each  parameter  is 
significant  at  a  confidence  level  greater  than  95%. 

Crashes  with  Property  Damage  Only 
All  crashes  that  satisfy  the  conditions  for  being 
considered  as  "crashes  with  property  damage  only"  are 
defined  as  the  response  variable  in  the  development  of  the 
following  model.   The  predictors  selected  by  the  stepwise 
procedure  were  found  to  be  identical  to  that  of  the 
predictor  in  the  total  crash  model.   Model  coefficients  and 
statistics  taken  form  the  S-plus  output  is  given  in  the 
following  sections. 


I.  The  Model: 

pdo  ~  slen  +  its  +  adt  +  spd  +  pk  +  pw  +  oupsc  +  slen*its 

+  slen*spd  +  adt*oupsc  +  spd"*pk, 
data  =  u,  init.theta  =  1.19283 
family  =  negative  binomial,  link  =  log 


■iiwiilii'i!niiiiiil  iillVJjMiti"  ■!  '  ''■■■-;;ffiiiffijw-!!i:i:"'''  ■"  ..  ■'i:ijii,ii;iii'ii;'" '      ■■"". '"-•■-■■     '''|•■^■'l'•;^{irafH;y;;;Hp!ffli(Wi!;^pi^Sr■^ 
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II.  Model  Coefficients:  / 

Variable         Value  Std.  Error  t  value 

(Intercept)  -8 . 134176e-001  3  .  590245e-001  -2.2656326 

slen   4.5993836-003  6 . 800011e-004  6.7637883 

its   1.6043386-001  2 . 065559e-002  7.7670903 

adt   6.7394086-005  6 . 728370e-006  10.0164053 

spd  -1.142828e-002  6 . 755102e-003  -1.6917993 

pk   2.9155096+000  7 .  402966e-001  3.9382991 

pw  -4.3176036-002  1 . 609964e-002  -2.6818011 

oupscl   1.6830906-001  1.5961436-001  1.0544732 

oupsc2  -2.2408356-001  7.2974876-002-3.0706933 

oupsc3  -3.7440796-002  4.2206116-002  -0.8870943 

slen:its  -1 . 670767e-004  2 . 200535e-005  -7.5925490 

slen:spd  -3 . 849744e-005  1 . 400834e-005  -2.7481802 

adtoupscl   3.5087236-006  1 . 187442e-005  0.2954858 

adtoupsc2   1.467654e-005  5.0010716-006  2.9346793 

adtoupsc3   1.1909066-006  2.8424756-006  0.4189681 

spd:pk  -0.07278249  0.02269562  -3.206897 


III.  Sum  of  Squares  Table: 

Variable  Df  Sum  of  Sq  Mean  Sq  F  Value      Pr(F) 

slen  1  488.037  488.0375  418.1057  0.00000000 

its  1  64.779   64.7789  55.4967  0.00000000 

adt  1  166.473  166.4727  142.6185  0.00000000 

spd  1  54.096   54.0958  46.3443  0. 00000000 

pk  1  23.585   23.5851  20.2056  0.00000737 

pw  1  5.964    5.9639  5.1093  0.02390909 

oupsc  3  13.221    4.4070  3.7755  0.01022712 

slen:its  1  61.613   61.6132  52.7846  0.00000000 

sl6n:spd  1  6.163    6.1625  5.2795  0.02168529 

adt:oupsc     3  10.670    3.5567  3.0471  0.02771363 

spd:pk  r:  1  10.284   10.2842  8.8105  0.00303205 
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■iV:'  'HnlS^y  s  f  s  of  Devi  aricl Table  "■" 

Df  Deviance  Resid.  Df  Resid. 


Dev 


Pr (Chi; 


NULL 

1 

1933 

slen 

607 

4972 

1932 

its 

1 

69 

9621 

1931 

adt 

1 

182 

4089 

1930 

spd 

1 

59 

0801 

1929 

pk 

1 

27 

7949 

1928 

pw 

1 

5 

2488 

1927 

oupsc 

3 

14 

3160 

1924 

slen: its 

1 

62 

9998 

1923 

slen: spd 

1 

5 

2794 

V  1922 

adt: oupsc 

3 

11 

0021 

:  1919 

spd:pk 

1 

11 

4445 

1918 

2849.347 

2241.850  0.00000000 
2171.888  0.00000000 
1989.479  0.00000000 
1930.399  0.00000000 
1902.604  0.00000013 
1897.355  0.02196189 
1883.039  0.00250510 
1820.039  0.00000000 
1814.760  0.02157881 
1803.757  0.01171443 
1792.313  0.00071705 


V.  Model  Statistics: 

Null  Deviance:  2849.347  on  1933  degrees  of  freedom 

Residual  Deviance:  1792.313  on  1918  degrees  of  freedom 

Theta:   1.19283       Std.  Err.:   0.09404 

2  X  log-likelihood:   -568.86389        AIC=  1822.216 


The  statistical  fitting  is  performed  by  generalized 
linear  model  analysis  procedures.   The  negative  binomial 
distribution  was  assumed  and  the  link  function  used  was 
natural  log.   Any  parameter  not  qualified  by  the  AIC 
criteria  was  removed.  The  value  of  theta  reduced  to  1.1928 
which  implies  that  the  overdispersion  is  less  than  the 
overdispersion  of  the  total  crash  frequency.   Since  the 
value  of  theta  is  greater  than  1,  the  assumption  of  negative 
binomial  distribution  is  validated. 
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Injury  Crashes 

The  crash  prediction  model  for  injury  crashes  was  also 
found  to  be  similar  to  that  of  the  total  crashes  and  that  of 
the  property  damage  crashes  in  terms  of  the  model  structure 
and  the  parameters  selected  by  the  stepwise  regression. 

The  variable  pk  (on~street  parking) ,  contributed  very 
little  to  the  model  and  had  a  p-value  slightly  greater  than 
0.05.  But  the  variable  was  allowed  to  stay  in  the  model 
since  the  presence  of  this  variable  reduced  the  AIC  value 
considerably.   Model  coefficients  and  statistics  taken  form 
the  S-plus  output  is  given  in  the  following  sections. 


I.  The  Injury  Prediction  Model: 

inj  ~  slen  +  its  +  adt  +  spd  +  pk  +  pw  +  oupsc  +  slen*its 

+  slen*spd  +  adt*oupsc  +  spd*pk 
data  =  u,  init.theta  =  1.53409 
family  =  negative  binomial,  link  =  log 


II.  Model  Coefficients: 

Variable          Value  Std.  Error  t  value 

(Intercept)  -1 . 120027e+000  3 . 227955e-001  -3.46977398 

slen   6.1588666-003  5 . 938955e-004  10.37028549 

its   1.574267e-001  1 . 797818e-002  8.75654130 

adt   6.382965e-005  6 . 207825e-006  10.28212739 

spd   1.493376e-003  5 . 958551e-003  0.25062735 

pk   2.042316e+000  6 . 903215e-001  2.95849985 

pw  -4.410585e-002  1 . 457123e-002  -3.02691294 
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oiipscl  -5.698699e-002  1 .  466253e-001  '  -'0.38865735 

oupsc2  -2.033925e-001  6  .  460879e-002  -3.14806185 

oupsc3  -1.159176e-003  3 . 712219e-002  -0.03122594 

"slen:its  -1 .  781309e-004  1 . 908024e-005  -9.33588415 

slen:spd  -6 . 270856e-005  1 . 220027e-005  -5.13993377 

adtoupscl   1.969406e-005  1 . 118039e-005    1.76148275 

adtoupsc2   1.929660e-005  4 . 528819e-006    4.26084722 

adtoupsc3  -4.570840e-007  2 . 568147e-006  -0.17798199 

spdrpk  -0.05359911  0.02096281  -2.556867 


III.  Sum  of  Squares  Table: 

Variable  Df  Sum  of  Sq  Mean  Sq  F  Value  Pr(F) 

slen  1  931.763  931.7629  861.9966  0.00000000 

its  1  59.140  59.1396  54.7115  0.00000000 

adt  1  255.198  255.1984  236.0903  0.00000000 

spd  1  25.213  25.2135  23.3256  0.00000148 

pk  1  2.973  2.9727  2.7501  0.09741066 

pw  1  18.187  18.1869  16.8251  0.00004270 

oupsc  3  9.360  3.1199  2.8863  0.03445649 

slen:its  1  91.595  91.5947  84.7365  0.00000000 

slen:spd  1  24.852  24.8521  22.9913  0.00000175 

adt:oupsc     3  24.269  8.0898  7.4840  0.00005576 

spdrpk  1  6.538  6.5376  6.0481  0.01400925 


IV.  Analysis  of  Deviance  Table 

Variable  Df  Deviance  Resid.  Df  Resid.  Dev  Pr(Chi) 

3547.693 

2444.488  0.00000000 

2386.951  0.00000000 

2105.877  0.00000000 

2083.311  0.00000203 

2080.060  0.07136516 

2061.762  0.00001890 
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slen:its   1   104.977       1923  1948.139  0.00000000 

slen:spd   1    20.397       1922  1927.742  0.00000629 

adt:oupsc   3    22.213       1919  1905.529  0.00005890 

spd:pk   1     7.435       1918  1898.094  0.00639606 

V.  Model  Statistics: 

Null  Deviance:  3547.693  on  1933  degrees  of  freedom 

Residual  Deviance:  1898.094  on  1918  degrees  of  freedom 

Theta:   1.5341  Std.  Err.:   0.11137 

2  X  log-likelihood:   2250.23304 

Start:   AIC=  1929.761 


The  statistical  fitting  is  performed  by  generalized 
linear  model  analysis  procedures.   The  negative  binomial 
distribution  was  assumed  and  the  link  function  used  was 
natural  log.   Any  parameter  not  qualified  by  the  AIC 
criteria  was  removed.   The  value  of  theta  was  found  to  be 
close  to  that  of  the  total  crashes. 


Fatal  Crashes 
The  negative  binomial  model  for  fatal  crash  frequencies 
failed  to  converge  to  a  single  solution.   Besides,  the 
dispersion  coefficient  was  found  to  be  less  than  one  and  the 
mean  value  was  found  to  be  very  close  to  the  variance. 
Model  coefficients  and  statistics  taken  form  the  S-plus 
output  are  given  in  the  following  sections. 
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I.  The  Fatal  Crash  Prediction  Model: 

fat  ~  slen  +  its  +  adt  +  spd  +  pw  +  slen*its  +  slen*spd 
family  =  Poisson,   link  =  log 

II.  Model  Coefficients: 

Variable  Value     Std.  Error    t  value 

(Intercept)  -5.92945695025  1.29439515643  -4.580871 

slen   0.00724953631  0.00179795119   4.032110 

,1,   its   0.06399140839  0.05711297790   1.120436 

adt   0.00004689686  0.00001458896   3.214545 

spd   0.06702686666  0.02257804788   2.968674 

pw  -0.11473662851  0.05273933695  -2.175542 

slen:its  -0.00011108824  0.00005325787-2.085856 

slen:spd  -0.00010094330  0.00003622215-2.786784 

III.  Sum  of  Squares  Table: 

Variable     Df  Sum  of  Sq  Mean  Sq  F  Value  Pr(F) 

80.084  80.08415  86.84743  0.0000000 

0.129   0.12939  0.14032  0.7080077 

9.45833  10.25711  0.0013838 

1.74890  1.89660  0.1686197 

6.08691  6.60096  0.0102670 

2.92306  3.16992  0.0751637 

7.76616  8.42203  0.0037491 


IV.  Analysis  of  Deviance  Table 

Variable  Df  Deviance  Resid.  Df  Resid.  Dev 

1933  650.2526 

1932  576.0285 

1931  575.6782 

1930  566.7285 

1929  563.7999 

1928  557.2133 
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s  1  e  n  :  i  t  s' 1 4.34202    ""   1  '9  2  7    552.871  T'' 

slenrspd   1   8.07612       1926    544.7951 

V.  Model  Statistics: 

Null  Deviance:  650.2526  on  1933  degrees  of  freedom 

Residual  Deviance:  544.7951  on  1926  degrees  of  freedom 


The  statistical  fitting  is  performed  by  generalized 
linear  model  analysis  procedures.   The  Poisson  distribution 
was  assumed  and  the  link  function  used  was  natural  log.   Any 
parameter  not  qualified  by  the  AIC  criteria  was  removed. 

A  brief  overview  of  the  Models 
Four  crash  prediction  models  were  displayed  in  the 
previous  sections.   These  models  can  be  used  to  predict  the 
total  crashes,  PDO  crashes,  injury  crashes  and  fatal  crashes 
for  two-lane  urban  highway  sections  in  Florida.   The 
importance  of  each  parameter  in  the  model  is  discussed  in 
the  following  sections.   While  varying  the  values  of  one 
variable  and  calculating  crash  frequencies,  all  other 
variables  are  held  constant  at  the  median. 

Longitudinal    Factors: 

Section  length  and  intersections,  which  were  originally 
included  in  the  base  model  as  longitudinal  factors  were 
found  to  be  significant  in  all  models.   The  interaction 
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between  these  two  variables  was  also  found  to  be 
significantly  high  in  all  models.   The  coefficient  of  the 
product  term  was  seen  to  be  consistently  negative  which 
shows  that  longer  sections  are  able  to  handle  larger  number 
of  intersections  compared  to  shorter  sections.   As  section 
length  increased  from  .05  miles  to  1.2  miles,  the  total 
crash  frequency  increased  from  .341  to  5.096,  the  PDO  crash 
frequency  increased  from  .144  to  1.486,  the  injury  crash 
frequency  increased  from  .192  to  3.201  and  the  fatal  crash 
frequency  increased  from  .005  to  .063. 


12Q0 


Section  Length  (feet) 


FIGURE  8.1   Crash  Frequencies  vs.  Section  Length  (AADT 
15000,  Speed  Limit  =  45,  Intersections  =  5,  Parking  = 
Present,  Pavement  Width  =  18',  Unpaved  Shoulder  =  3') 
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Figure  8.1  shows  the  curves  obtained  from  plotting 
crash  frequencies  at  various  values  of  section  length.    For 
every  .25  mile  increase  in  section  length,  the  total  crash 
frequency  increased  by  80.00%,  the  PDO  crash  frequency 
increased  by  66.18%,  the  injury  crash  frequency  increased  by 
84.33%,  and  the  fatal  crash  frequency  increased  by  71.24%. 
The  relationship  is  non-linear. 


Intersections 


FIGURE  8.2   Crash  Frequencies  vs.  Intersections  (AADT  = 
15000,  Speed  Limit  =  45,  Section  Length  =  .75  miles.  Parking 
=  Present,  Pavement  Width  =  18',  Unpaved  Shoulder  =  3') 


As  the  number  of  intersections  in  a  section  increased 
from  0  to  15,  the  total  crash  frequency  increased  from  1.56 
to  2.28,  the  PDO  crash  frequency  increased  from  .50  to  .846, 
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the  injury  crash  frequency  increased  from  .945  to  1.351  and 
the  fatal  crash  frequency  decreased  from  .026  to  .020. 

The  effect  of  intersections  on  mid-block  crash 
frequencies  is  relatively  low.   This  result  was  expected 
since  the  crashes  that  occur  at  the  intersections  were  not 
included  in  the  analysis.   Figure  8.2  shows  the  curves 
obtained  from  plotting  crash  frequencies  corresponding  to 
various  numbers  of  intersections  per  section. 

For  every  3  intersections,  the  total  crash  frequency 
increased  by  7.9%,  the  PDO  crash  frequency  increased  by 
11.11%,  the  injury  crash  frequency  increased  by  7.41%  and 
the  fatal  crash  frequency  decreased  by  5.63%. 

Operational   Factors: 

AADT,  speed  limit,  and  presence  of  on-street  parking 
were  found  to  be  significant  parameters  in  the  total,  PDO, 
and  injury  crash  prediction  models.   All  these  parameters 
except  presence  of  on-street  parking  were  found  to  be 
significant  in  the  fatal  crash  model. 

As  AADT  increased  from  5000  to  25000,  the  total  crash 
frequency  increased  from  .54  to  4.13,  the  PDO  crash 
frequency  increased  from.  .189  to  1.354,  the  injury  crash 
frequency  increased  from  .333  to  2.445  and  the  fatal  crash 
frequency  increased  from  .012  to  .038, 
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FIGURE  8.3   Crash  Frequencies  vs.  AADT  (Speed  Limit  =  45, 
Section  Length  =  .75  miles.  Intersections  =  5,  Parking  = 
Present,  Pavement  Width  =  18',  Unpaved  Shoulder  =  3') 


Figure  8.3  shows  the  curves  obtained  from  plotting 
crash  frequencies  corresponding  to  various  levels  of  AADT. 
For  every  increase  of  AADT  by  5000,  the  total  crash 
frequency  increased  by  52.74%,  the  PDO  crash  frequency 
increased  by  50.74%,  the  injury  crash  frequency  increased  by 
51.53%  and  the  fatal  crash  frequency  increased  by  26.43%.  Up 
to  an  AADT  of  15000,  the  crash  frequency  increased 
moderately.   Beyond  an  AADT  of  15000  the  total  crash 


107 

frequency  is  seen  to  increase  sharply  due  to  the  sharp 
increase  in  injury  crash  frequency. 

As  speed  limit  increased,  the  total  crash  frequency, 
PDO  crash  frequency  and  injury  crash  frequency  decreased 
while  the  fatal  crash  frequency  increased.   As  speed  limit 
increased  from  30  mile  per  hour  to  55  mile  per  hour,  the 
total  crash  frequency  decreased  from  4.94  to  .892,  the  PDO 
crash  frequency  decreased  from  2.11  to  .257,  the  injury 
crash  frequency  decreased  from  2.326  to  .632  and  the  fatal 
crash  frequency  increased  from  .01  to  .05.   At  speed  limits 
below  40  mph,  the  PDO  crashes  are  seen  to  increase  sharply 
resulting  in  a  sharp  rise  in  total  crash  frequency. 

Figure  8.4  shows  the  curves  obtained  from  plotting 
crash  frequencies  corresponding  to  various  levels  of  speed 
limits.   For  every  5  mph  increase  in  speed  limit,  the  total 
crash  frequency  decreased  by  29.0%,  the  PDO  crash  frequency 
decreased  by  34.37%,  the  injury  crash  frequency  decreased  by 
22.94%  and  the  fatal  crash  frequency  increased  by  39.81  %. 

Presence  of  on-street  parking  was  found  to  affect 
total,  PDO,  and  injury  crash  frequencies.   The  total  crash 
frequency  was  4.00  with  no  on-street  parking  and  was  5.29 
with  on-street  parking.  The  PDO  crash  frequency  was  1.28 
with  no  on-street  parking  and  was  1.85  with  on-street 
parking.  The  injury  crash  frequency  was  2.43  with  no  on- 
street  parking  and  was  2.87  with  on-street  parking. 
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FIGURE  8.4  Crash  Frequencies  vs.  Speed  Limit  (Section  Length 
=  .75  miles,  AADT  =  15000,  Intersections  =  5,  Parking  = 
Present,  Pavement  Width  =  18',  Unpaved  Shoulder  =  3') 


Figure  8.5  shows  the  plots  of  crash  frequency 
corresponding  to  the  absence  of  or  presence  of  on-street 
parking.   The  percentage  increases  in  crash  frequency  due  to 
permitting  on  street  parking  are  32.38%  on  total  crash 
frequency,  44.50%  on  PDO  crash  frequency  and  18.10%  on 
injury  crash  frequency.   The  bar  chart  represents  total 
crashes.  Property  Damage  Only  crashes  and  injury  crashes 
respectively. 
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Parking 


FIGURE  8.5   Crash  Frequencies  vs.  Parking  (Section  Length 
.75  miles,  AADT  =  15000,  Speed  Limit  =  45,  Intersections  = 
5,  Pavement  Width  =  18',  Unpaved  Shoulder  =  3') 


Cross   Sectional    Factors: 

Lane  width,  paved  shoulder  width  and  unpaved  shoulder 
width  are  the  important  cross  sectional  factors  for  two-lane 
urban  highways.   The  peculiar  behavior  of  lane  width 
prevented  it  from  being  significant  in  the  model  as  a 
continuous  variable.   The  categorical  treatment  on  lane 
width  was  able  to  reveal  the  strange  behavior  which  will  be 
discussed  in  the  next  chapter.   Based  on   the  model 
selection  strategy  adopted  in  regression  analysis,  the 
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pavement  width  which  is  the  sum  of  lane  width  and  shoulder 
width  was  used  instead  on  either  of  the  variables. 


15  M 

Pavement  Width  (feet) 


FIGURE  8.6   Crash  Frequencies  vs.  Pavement  Width  (Section 
Length  =  .75  miles,  AADT  =  15000,  Speed  Limit  =  45, 
Intersections  =  5,  Parking  =  Present,  Unpaved  Shoulder  =  3' 


As  pavement  width  increased,  the  crash  frequencies 
decreased.   When  pavement  width  increased  from  9  feet  to  25 
feet  on  each  direction,  the  total  crash  frequency  decreased 
from  2.61  to  1.31,  PDO  crash  frequency  decreased  from  .88  to 
.44,  injury  crash  frequency  decreased  from  1.58  to  .78  and 
the  fatal  crash  frequency  decreased  from  .068  to  .011. 

Figure  8 . 6  shows  the  plot  of  crash  frequencies 
corresponding  to  various  levels  of  pavement  width.   As 
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pavement  width  was  allowed  to  increase  in  stages  of  3  feet, 
the  total  crash  frequency  decreased  by  12.13%,  the  PDO  crash 
frequency  decreased  by  12.14%,  the  injury  crash  frequency 
decreased  by  12.39  %  and  the  fatal  crash  frequency  decreased 
by  29.12%. 
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FIGURE  8.7   Crash  Frequencies  vs.  Unpaved  Shoulder  Width 
(Section  Length  =  .75  miles,  AADT  =  15000,  Speed  Limit  =  45, 
Intersections  =  5,  Parking  =  Present,  Pavement  Width  =  18') 


Unpaved  shoulder  width  was  found  to  have  a  very  strange 
behavior  pattern.   Unpaved  shoulder  is  defined  as  a 
categorical  variable  since  the  regression  analysis  strategy 
chose  this  variable  instead  of  the  continuous  variable  even 
though  the  degrees  of  freedom  increased  by  3.   When  unpaved 


shoulder  width  was  allowed  to  increase  from  0  to  9  feet  in 
stages  of  3  feet,  crash  frequency  increased  initially  and 
then  decreased  gradually  to  a  value  close  to  that  of  the 
crash  frequency  at  0  feet  width  of  unpaved  shoulder. 

Figure  8.7  shows  the  plot  of  crash  frequencies  for 
various  values  of  unpaved  shoulder.   Based  on  the  shape  of 
the  plot,  it  can  be  concluded  that  it  is  better  not  to 
provide  an  unpaved  shoulder  than  to  provide  a  narrow  unpaved 
shoulder. 


CHAPTER  9 
FOR  FURTHER  STUDIES 


The  prediction  models  developed  from  regression 
analysis  and  the  effect  of  each  parameter  on  crash  frequency 
were  discussed  in  this  report.   Inclusion  of  important  terms 
like  percentage  of  heavy  vehicles  and  categorization  of 
crash  type  based  on  cause  of  occurrence  can  lead  to  better 
model  performance. 

Any  improvement  to  the  model  obtained  either  by  the 
addition  of  one  or  more  terms  or  by  the  modification  of  an 
existing  term  should  be  justified  by  the  improvement  in  the 
model  statistics  as  well  as  the  actual  prediction  error. 
The  model  testing  and  selection  procedure  used  in  this  study 
can  be  used  for  such  studies. 

Some  behavioral  pattern  exhibited  by  section  length  and 
cross  sectional  variables  are  discussed  briefly  in  this 
chapter.   A  few  more  models  are  developed  by  modifying  the 
final  model.   Though  these  models  are  inferior  to  the  final 
model,  further  studies  may  be  able  to  reveal  valuable 
relationships.   All  models  used  in  this  chapter  are 
developed  using  total  crashes  as  the  response  variable. 
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Section  Length 
Crash  frequencies  are  calculated  and  plotted  for 
section  length  ranging  from  .1  mile  to  1.2  miles.   Plot 
number  1  in  Figure  10.1  shows  the  curve  representing  the 
effect  of  section  length  on  crash  frequency.   Plot  2  in 
Figure  9.1  shows  the  crashes  per  mile  for  different  values 
of  section  length.   A  large  number  of  short  sections 
represent  irregularities  in  highway  design  while  a  small 
number  of  long  sections  represent  uniformity  in  highway 
design.   According  to  this  plot,  a  uniformity  in  highway 
design  is  desirable  up  to  a  section  length  of  .5  miles. 
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FIGURE  9.1  Total  Crash  Frequency  vs.  Section  Length  (AADT  = 
15000  vpd.  Speed  Limit  =  45  mph,  Intersections  =  5,  Parking 
=  present,  Pavement  width  =  18',  Unpaved  shoulder  =  3') 
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Illustration: 

If  a  one  mile  stretch  of  highway  consists  of  10 
sections  of  .1  mile  each,  a  total  crash  frequency  of  3.84 
can  be  expected  while,  4  sections  of  .25  mile  each  results 
in  a  total  crash  frequency  of  2.184,  2  sections  of  .5  mile 
each  results  in  a  total  crash  frequency  of  1.97,  1.25 
sections  of  .75  mile  each  results  in  a  total  crash  frequency 
of  2.36,  and  1  section  of  1  mile  results  in  a  total  crash 
frequency  of  3.184. 

Implications : 

(a)  The  uniformity  of  design  should  be  maintained  for  a 
length  of  atleast  half  a  mile.   (b)  Longer  highway  sections 
are  associated  with  higher  crash  rate.   Though  this  result 
is  of  insignificant  importance  to  new  design  or  improvement 
plans,  the  knowledge  concerning  the  actual  cause  can  prevent 
the  safety  engineer  from  drawing  wrong  conclusions  about 
other  design  parameters. 

Lane  Width 
A  new  model  is  developed  from  the  final  crash 
prediction  model  by  defining  lane  width  as  a  categorical 
variable.   The  summary  of  the  t-statistics  analysis  is  given 
in  the  following  listing.  The  p-value  suggests  that  the 


116 

levels  representing  12  feet  and  13  feet  width  are 
significantly  different  from  the  base  level  which  is  10' 


Lane  Width  Level      Value  Std.  Error  I  value     p-value 

11  Iwcl  -0.099261  6.0577e-002  -1.6385936  .1014539 

12  lwc2  -0.042720  2.3548e-002  -1.8141509  .0698121 

13  lwc3    0.063697  3.0424e-002    2.0936320  .0364213 

14  lwc4    0.032517  3.8586e-002    0.8427102  .3994914 


2.3  , 


Lane  Width    (feet) 


FIGURE  9.2  Total  Crash  Frequency  vs.  Lane  Width  (AADT  = 
15000  vpd,  Speed  Limit  =  45  mph,  Intersections  =  5,  Parkinc 
=  present.  Pavement  width  =  18',  Unpaved  shoulder  =  3') 
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Figure  9.2  shows  the  curve  representing  the  crash 
frequency  corresponding  to  each  level  of  lane  width.  The 
curve  shows  that  as  lane  width  increases  from  10  feet  to  14 
feet,  the  crash  frequency  decreases  and  then  increases. 
When  the  lane  width  is  11  feet,  the  crash  frequency  is  seen 
to  be  a  minimum  which  is  about  11.4%  less  than  that  of  the 
crash  frequency  at  10  feet  width.   As  lane  width  is  further 
increased  to  12  feet  and  13  feet,  the  crash  frequency 
increases  by  3.2%  and  7.1%  respectively.   At  a  14  feet 
width,  the  crash  frequency  decreased  slightly  by  0.7%. 

Paved  Shoulder  Width 

Another  model  is  developed  from  the  final  model  with 
paved  shoulder  width  defined  as  a  categorical  variable.  The 
summary  of  the  t-statistics  analysis  is  given  in  the 
following  listing.   The  p-value  suggests  that  the  levels 
representing  3  feet  and  6  feet  width  are  significantly 
different  from  the  level  representing  0  feet. 

Figure  9.3  shows  the  curve  representing  the  crash 
frequency  corresponding  to  various  levels  of  paved  shoulder 
width.   The  curve  shows  that  as  paved  shoulder  width 
increases,  the  crash  frequency  decreases  and  then  increases. 
Best  results  are  obtained  when  the  paved  shoulder  width  is 
in  the  neighborhood  of  3  to  6  feet.   Crash  frequency  reduced 
by  12.9%  when  a  paved  shoulder  of  about  3  feet  was  provided. 
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As  paved  shoulder  width  was  increased  to  6  feet  the  crash 
frequency  remained  almost  the  same  as  that  of  a  3  feet  wide 
paved  shoulder.   When  the  paved  shoulder  was  increased  to  9 
feet,  the  crash  frequency  increased  by  16.8%. 


Opsc  Level       Value  Std.  Error         t  value       p-value 

3  opscl -0.134669  3.777730e-002  -3.5648164  .0003727 
6  opsc2 -0.179358  5.966281e-002  -3.0062061  .0026783 
9       opsc3  0.022395  3. 84643 7e-002    0.5822482  .5604653 


3  6 

Paved  Shoulder  Width  (feet) 


FIGURE  9.3  Total  Crash  Frequency  vs.  Paved  Shoulder  (AADT  = 
15000  vpd,  Speed  Limit  =  45  mph,  Intersections  =  5,  Parking 
=  present.  Pavement  width  =  18',  Unpaved  shoulder  =  3') 
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Unpaved  Shoulder  Width 
The  unpaved  shoulder  width  is  defined  as  a  categorical 
variable  with  a  degree  of  freedom  of  3  in  the  final  model. 
The  summary  of  the  t-statistics  analysis  is  given  in  the 
following  listing.   The  p-value  suggests  that  the  level 
representing  3  feet  is  significantly  different  from  the 
level  representing  0  feet. 


Oupsc  Level      Value  Std.  Error  t  value       p-value 

3  oupscl  0.222361  5.985641e-002  3.7149139  .0002088 
6  oupsc2  0.009436  2.856760e-002  0.3303389  .7411792 
9      oupsc3 -0.017273  1.682358e-002  -1.0267381   .3046718 


3  6 

Unpaved  Shoulder  Width  (feet) 


FIGURE  9.4  Total  Crash  Frequency  vs.  Unpaved  shoulder  (AADT 
=  ISOOOvpd,  Speed  Limit  =  45  mph.  Intersections  =  5,  Parking 
=  present,  Pavement  width  =  18',  Unpaved  shoulder  =3'). 
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Figure  9.4  shows  the  plots  prepared  for  representing 
the  crash  frequency  corresponding  to  each  level  of  unpaved 
shoulder  width.  The  curve  shows  that  as  unpaved  shoulder  is 
introduced  the  crash  frequency  increases  abruptly  by  about 
24,8%  and  then  as  the  width  is  increased  the  crash  frequency 
drops  gradually.   In  other  words,  an  unpaved  shoulder  of  3 
feet  should  be  avoided  in  any  case.   If  an  unpaved  shoulder 
is  provided  it  should  at  least  be  6  feet  wide. 

Raised  Curb 

Raised  curb  was  expected  to  be  a  significant  parameter 
in  crash  prediction  models.   In  the  final  model,  the  logical 
variable  oc,  that  represents  the  presence  or  absence  of 
raised  curb  was  rejected  by  the  algorithm  that  drives  the 
step-wise  regression.  This  variable  was  accepted  by  the 
model  when  unpaved  shoulder  was  excluded  from  the  model. 

The  summary  of  the  t-statistics  analysis  is  given  in 
the  following  listing.   The  p-value  suggests  that  the  level 
representing  the  presence  of  a  raised  curb  is  significantly 
different  from  the  level  representing  the  absence  of  a 
raised  curb. 

It  was  found  that  the  crash  frequency  increases  by 
28.6%  in  the  presence  of  raised  curb.   The  rejection  of  this 
variable  from  the  model  is  suspected  to  be  a  result  of  its 
high  correlation  with  the  categorical  variable  that 
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represents  the  presence  of  a  narrow  unpaved  shoulder  of 
about  2  feet  width. 


Variable  Value  Std.  Error        t  value      p-value 

oc  0.17053693    9.961 155e-002    1.712020     .087421 
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FIGURE  9.5  Total  Crash  Frequency  vs.  Raised  Curb  (AADT  = 
15000  vpd,  Speed  Limit  =  45  mph,  Intersections  =  5,  Parking 
=  present,  Pavement  width  =  18',  Unpaved  shoulder  =3'). 


CHAPTER  10 
CONCLUSIONS 


The  study  done  on  two-lane  urban  highways  resulted  in 
the  development  of  some  prediction  models  that  can  predict 
crash  frequency  for  given  geometric  and  traffic  conditions 
within  acceptable  level  of  error.   The  models  can  be  used  to 
understand  the  effect  of  various  parameters  on  safety,  which 
can  be  applied  in  the  design  of  new  highway  sections  and  to 
check  the  safety  of  existing  highway  sections.   The 
procedure  used  in  the  analysis  can  be  followed  to  develop 
crash  prediction  models  for  other  type  of  highways. 

Conclusions 

Based  on  this  study  some  conclusions  were  made.  These 
conclusions  that  are  briefly  discussed  below  are  concerned 
I         with  the  analysis  issues  as  well  as  the  results  of  the 
study. 

The  uniformity  of  highway  design  influences  crash 
frequency.   Therefore,  crash  rate,  a  function  of  AADT  should 
not  be  considered  as  the  dependent  variable.  Instead  crash 
frequency  should  be  chosen  as  the  dependent  variable  while 
section  length  which  represents  the  uniformity  of  highway 
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design  should  be  treated  as  one  of  the  independent 
variables . 

The  relationship  between  AADT  and  crash  frequency  is 
not  linear.   Since  crash  rate  is  also  a  function  of  AADT 
this  result  further  justifies  the  selection  of  crash 
frequency  as  the  dependent  variable.  AADT  should  be  treated 
as  another  independent  variable. 

Crash  frequency  need  not  always  follow  Poisson 
distribution.   In  situations  where  over-dispersion  exists, 
negative  binomial  distribution  should  be  considered.   In  the 
case  of  two-lane  urban  highways,  total,  PDO,  and  injury 
crash  frequencies  were  found  to  follow  negative  binomial 
distribution  while  fatal  crashes  were  found  to  follow 
Poisson  distribution. 

Intersections  affect  mid-block  crashes.  But  as  the 
section  length  increases,  the  effect  of  intersections  on 
crash  frequency  decreases. 

Cross  sectional  parameters  were  found  to  have  nonlinear 
relationship  with  crash  frequency.   The  results  showed  that 
increasing  lane  width,  paved  shoulder  width  and  unpaved 
shoulder  width  beyond  certain  limits  have  no  advantage  while 
it  can  adversely  affect  safety. 

Presence  of  raised  curb  was  found  to  increase  crash 
frequency. 
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Limitations 

Though  the  models  displayed  very  desirable  qualities, 
there  are  some  limitations.   The  effect  of  the  percentage  of 
heavy  vehicles  on  safety  is  not  known  due  to  the 
unavailability  of  accurate  data.   There  are  accidents  that 
are  caused  by  vehicle  failure  and  by  improper  driver 
conditions.   In  this  study,  those  types  of  accidents  are  not 
identified  or  categorized.  Therefore  the  models  developed 
using  the  available  data  has  some  level  of  unexplained 
variation . 

The  results  of  this  study  are  applicable  only  for  two- 
lane  urban  highways  in  the  State  of  Florida.   Since  design 
factors  can  behave  differently  and  affect  safety  in 
different  manner  for  different  types  of  highways,  similar 
results  cannot  be  expected  for  other  types  of  the  highways 
though  the  same  analysis  procedure  can  be  used. 

The  results  from  categorical  analysis  of  cross 
sectional  parameters,  as  discussed  in  the  chapter  'For 
Further  Studies'  yielded  very  good  p-values  for  most  of  the 
variable  levels  but  had  insignificant  p-values  for  a  few 
levels.  This  could  be  due  to  the  lower  number  of 
observations  in  such  levels,  which  make  the  results 
unreliable  at  those  levels. 
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